All writing
Engineering LeadershipJul 22, 2024 · 6 min read

On-Prem to Cloud-Native With Under an Hour of Downtime

Big-bang migrations fail loudly. Here's the incremental, reversible approach I use to move legacy systems to the cloud while the business keeps running.

Migrate< 1 hour downtime

Every catastrophic migration story starts the same way: a weekend cutover, a rollback plan nobody tested, and a Monday morning that becomes a week. I've re-platformed legacy enterprise systems to Kubernetes across Azure, AWS and GCP and kept downtime under an hour, not by being lucky, but by refusing to do a big bang.

Reversible, always

The governing rule is that every step must be reversible. If a change can't be rolled back in minutes, it gets broken into smaller changes until it can. That single constraint shapes everything else.

  • Strangle, don't replace, route traffic to new services incrementally behind a proxy, leaving the legacy path live.
  • Dual-write and verify, write to old and new data stores in parallel, comparing results before you trust the new one.
  • Shadow traffic, replay production load against the new system with no user impact until it earns confidence.
  • Cut over a slice, move one tenant, one region, one feature at a time, with an instant route back.
Downtime is a function of batch size. Shrink the batch and the risk shrinks with it.

The cutover that wasn't an event

By the time the "final" cutover arrives, almost everything already runs on the new platform. The remaining switch is small, rehearsed, and reversible, so the sub-hour downtime window is real margin, not optimism. On one year-long logistics re-platform we hit zero downtime doing exactly this.

<1hr
Downtime on enterprise on-prem to cloud cutovers
0
Downtime on a year-long logistics re-platform
3
Clouds in production, Azure, AWS, GCP

The cloud isn't the hard part anymore. Doing the move without betting the business on a single weekend is, and that's entirely a function of how you sequence the work.

Keep reading
Metrics & DORA · Feb 2, 2025

From Low Performer to Elite: A Three-Month DORA Transformation

AI-Native · Apr 9, 2025

Becoming AI-Native: Rebuilding the Operating Model, Not Just the Product

Engineering Leadership · Dec 18, 2024

The Kind of CTO You Need at Each Stage