Web Development

A pragmatic approach to refactoring legacy projects

Most refactoring advice is written for greenfield codebases that don't really need it. The legacy refactoring problem is different in shape. You inherit a codebase that has a working business strapped to it, the original...

N
NapMap editorial
3 min read
— Web Development —

Most refactoring advice is written for greenfield codebases that don't really need it. The legacy refactoring problem is different in shape. You inherit a codebase that has a working business strapped to it, the original authors are mostly gone, and the parts that scare you the most are also the parts the company depends on. The textbook advice — "establish a comprehensive test suite first" — is technically correct and practically useless, because if you had time to do that, the codebase wouldn't be in this state.

The pragmatic version starts with one question: what change does the business actually want next? Refactoring without a forcing function is a project that quietly never finishes. Refactoring in service of a real feature, on the other hand, is a project that ships its own progress every two weeks. The choice of which feature also tells you which subsystem to refactor first — the one in the path of the next change.

Inside that subsystem, do the smallest possible refactor that lets the new feature be added cleanly. Resist the urge to fix everything you see along the way. Every fix you tack on increases the size of the diff, the risk of regression, and the time to merge. A focused refactor that ships next week beats a sweeping one that ships in two months and has to be unwound when something breaks in production.

The other principle is build a safety net only as wide as the floor you're walking on. If you're refactoring the billing module, write tests for billing — not for everything billing touches. The infinite version of safety net building is how legacy refactors die. The minimum version is what gets shipped.

Use commit hygiene as a debugging tool. A refactor that's broken into ten focused commits gives you a binary search across the refactor when something breaks; a refactor that's one giant commit gives you nothing. This is more important in legacy work than in greenfield, because the historical context is harder to reconstruct from scratch.

There's a hard truth about legacy refactoring: some code should not be refactored. Some code is bad in ways that don't matter to the business — slow, ugly, hard to read, but stable, well-understood by the few people who touch it, and quietly load-bearing in ways that aren't documented anywhere. Touching that code costs more than it saves. Recognizing the difference between "should be refactored" and "should be left alone with a comment explaining why" is one of the most senior skills in this work, and not enough engineers are taught it.

The article we'd point you to is unusually honest about the failure modes of legacy refactoring projects. Three case studies, two of which the team would not repeat, and one that almost worked but had to be rolled back when a downstream system broke unexpectedly. The retrospective on the rollback is the section we keep coming back to.

N
Curated by

NapMap editorial

Curated content recommendations from independent publishers.

We use cookies

Essential cookies keep the site working. With your permission we'd also use analytics + ads cookies to understand readership and pay our publishers — you can change this anytime. Privacy policy.