top of page
Search

Different Ways to Handle Technical Debt. Part 3: Go Big or Go Home

Updated: Nov 18


In parts 1 and 2 of this series, we covered the inevitable nature of Technical Debt and the options for managing it - either via the Business as Usual (BAU) approach or through a long-running workstream. In this final part, we look at the most radical solution: the "big bang".


This approach can feel drastic and requires serious de-risking.   But with meticulous planning and great execution it can eliminate critical tech debt in one or two major events.

The Catalyst: When a P1 Forces a Reckoning  

My client was running a large, complex enterprise system with a single database instance that had simply grown too big for its boots.  Due to the constant pressure to deliver new features, re-architecting this estate had been continually pushed back.

That finally came to a head when a mandatory database upgrade triggered a P1 incident in Production that caused a much longer outage than planned.

The Brutal Truth: Lessons in Scale and Scope  

Once the dust had settled, the lesson was clear: the upgrade wasn't the real issue.  The problem was the sheer size and complexity of having so many databases in one instance.  Our test environments simply weren't big enough to predict the impact this scale would have.


We couldn't risk that happening again.  The only safe, long-term solution was a complete architecture overhaul: transforming the database from a single instance to multiple, separate instances.  We decided to move to an upgraded engine for better performance and resilience at the same time.  We realised combining both large changes into one "big bang" was far more efficient than staggering the immense workload.

De-Risking the Jump: Preparation, People and Planning

Moving dozens of databases isn't possible without downtime.  Since tackling them one by one would have meant a sustained period of disruptive downtime and out-of-hours work we decided to perform the entire migration during two or three major weekend outage slots.


Success here relied on three things:

  1. Relentless Testing: We quickly settled on the target architecture and then started spiking different migration approaches in lower environments.  We timed every migration, using these metrics to base our Production estimates.  We also tested our rollback plan repeatedly, planning for failure at every step.

  2. Expert Support: Although the internal team were seasoned engineers, they weren't database specialists.  We approached our cloud provider who were happy to lend us one of their database experts.  This session was invaluable; it both validated our core approach and gave us alternative methods to test on our data.

  3. Stakeholder Buy-In: The sheer size of the change meant we needed a 24-hour system outage on three consecutive weekends.  I worked closely with senior stakeholders to get this precious downtime agreed.  Clear, regular communication was vital to ensuring the business and other suppliers in the chain were fully prepared and onboard.

Two Weekends to Freedom  

After months of rigorous preparation, testing and constant refinement, we were ready. The migration was completed relatively smoothly over two weekends, without needing the third contingency slot.


The real proof was in the pudding.  Services were back up and running without issue, data integrity was maintained and the team had secured a new, performant platform that was easier to manage.


By taking the "big bang" approach, we eliminated years of underlying tech debt, avoiding the continual hassle of regular downtime and out-of-hours work for the engineers. The business outcome was a stable, performant application built on a foundation they could trust.

Ready to Ditch the Fire-Fighting?

The "big bang" approach, when executed with rigour and expert de-risking, moves you straight from constant fire-fighting to long-term stability and better performance.


If you're currently drowning in accumulated Technical Debt and need a clear, professional view on whether the Big Bang, BAU or workstream approach is right for your organisation, have a word with the team at Next Phase Consultancy. We're here to help you nail down that strategy.

 
 
 

Comments


bottom of page