So you start your own company, you’re living the dream, struggling, eating canned soup and praying desperately for your big break. And it comes! Things work out, and your company thrives. Or perhaps you join somewhere along the way, learn the tech stack and culture, get enmeshed, make a contribution. But things that worked well for two people don’t work as well when you’re at a hundred, or a thousand, and somewhere along the line the cracks begin to show. Technology decisions made because they were familiar, convenient, or interesting at the time suddenly aren’t scaling with your user base. Or a release process that was ad hoc and relied on everyone knowing everything intimately is no longer tenable. Or security protocols that made sense when there were only two engineers (i.e. none) are completely insane for a company of thousands.
There are three key stages which these types of crises will move through:
- Early rumblings
This is the point in the movie where the crisis is happening somewhere else. You’re vaguely aware that there’s something wrong, conspiracy theorists are shouting that the world’s about to end, but you either don’t believe, or think that you can get by for now without paying much attention. And you can, for a while.
- Paralysis
By now, everyone knows that things are going horribly, horribly wrong. The seas are rising, zombies are wandering the streets, and your servers are starting to melt through the data center’s floor. But even with the obvious danger, the cost of action is still way too high – maybe you have to migrate to a new technology, change architectures, violate a core company value – whatever it is, no one with the power to do so is willing to take a stand and save the day.
- Too late
Crises are dangerous, and inaction can be catastrophic. This doesn’t mean your company is going to go down (not all crises are existential), but waiting until all your choices collapse to a point is going to cost you. It’ll cost more time and money than if it had been addressed earlier. It’ll cost morale, since the people who took a chance and spoke up early got burned by the process. It’ll cost future productivity, since the engineers who had to work round the clock to fix the problem are completely burned out. It may cost in key user metrics (revenue, traffic, brand loyalty, etc.), since your users will be frustrated and more likely to try alternatives. And it will cost your company culture, as the company’s narrative of ongoing success (which most companies try to cultivate) will be deeply shaken, driving increased cynicism and decreased loyalty.
The problem is, not every sign of the apocalypse comes to fruition. Sometimes, a useful hack outlives its purpose, but isn’t worth the time and effort to fix (and never will be). Often there’s a real problem, but the proposed solution is way out of line with the requirements (“let’s develop our own proprietary technology to overcome a well-understood and easily solvable problem!”), is painfully naive (“let’s uncritically follow this advice I read in a comment thread on HackerNews!”), or merely serves to identify the advocate’s personal interests (“let’s rewrite the software in Ook!“).
But where does this leave us? I think it can be useful to think of growing, looming problems as being one of two different types: sacred cows or overdue library books. The first type occurs because someone with the ability to say so has decided that this is the one true path. Our company culture states X, and therefore we will fit the solution to every problem to X, no matter what. Sacred cows can be extremely frustrating, because there isn’t much you can do about them, less a highly public and bruising political fight which you’re likely to lose. There’s an important corollary, though – sacred cows start their lives as decisions, and frequently have good reasons behind them. After a while, they become part of the background noise, and no one thinks to communicate the why any more – they simply are. When you disagree with a sacred cow, take the time to really understand the reasoning behind it. Maybe you’ll agree and be converted. Maybe you’ll gain some respect for the other viewpoint. And if you still choose to fight, at least you’ll have all the facts, and won’t spend time tilting at windmills that only exist in your mind.
Overdue library books are trickier. These have the most potential to go through the three stages outlined above. These are the problems that grow slowly over time, frogs enjoying leisurely baths in gradually warming waters. The problem is that there isn’t one point at which you go from mild discomfort to agonizing pain – it’s a continuum that people experience differently, and the default high level position is always to prioritize projects that increase revenue, improve user experience, etc. Late-night heroics to keep the servers from falling over, hacks layered on sedimentary hacks to glue together pieces of code that should never know each other, much less intertwingle, last minute saves and near misses, are always going to have less visibility than product wins, but waiting until the team’s luck runs out before taking a stand is the worst possible strategy. Each team usually knows what beast slouches towards its own personal Bethlehem. For these, the answer is usually to start communicating the problem – and a solution – early, often, and with increasing volume.