Understanding Performance

Alice completes half as many story points as anyone else on the team. She uses TDD and has never shipped a bug. Her code is clean, easy to review, easy to maintain, and she’s never caused an outage. Is she slow?

Bob completes twice as many story points as anyone else on the team. His PRs are messy, hard to understand, and take a lot of time to review. He frequently has to fix bugs in his code, which results in multiple re-reviews. His unit tests are mediocre, he frequently ships regressions, and he’s directly caused at least two outages. His code is hard to maintain, and a lot of what he writes is instant tech debt. But he ships a lot. Is he fast?

Carol is the most knowledgeable person on the team. She reviews all the team’s PRs, monitors the support and alert Slack channels, and answers all of the team’s technical questions. Whenever there’s a problem, she jumps in and handles it. Everyone on the team – and the company – has come to depend on her, and no one else knows how to manage several critical systems. People joke that the team should have a code freeze whenever she’s on vacation. Is Carol performing well?

Dara doesn’t complete many tasks, but things just work better when he’s around. He always seems to know when a project is in trouble, and somehow manages to get things unblocked before they blow up. He’s written or updated a lot of the team’s documentation and runbooks, onboarded multiple engineers, and watched over the interns to make sure they didn’t break too many things. He’s a great interviewer, and has referred several fantastic hires. He’s written a couple of small tools that have become crucial parts of the team’s workflow… But he just doesn’t ship much. Is he a poor performer?

Elena is a Product Manager. She uses ChatGPT to generate her PRDs, status reports, and proposals. It takes her minutes to do work that would take other people hours. All of her documents are extremely long and detailed, and have a strangely vague quality that make them difficult to understand. While it’s fast for her to generate the documents, it takes readers a long time to read and understand them. Like Bob, she produces far more deliverables than her peers. And like Bob, she costs everyone around her time and energy. How is she performing?

Fran is a brilliant coder. He’s fast, he has high quality, he knows when to rearchitect and when to do a quick and dirty hack, he knows where all the bodies are buried, and frequently has the key insight in moments of crisis. He’s also a bit of a jerk, and putting him on a team destroys morale, reduces everyone else’s output, and in one case caused another employee to rage quit. Does everyone else just need to get over themselves and learn how to work with him?

Each of the above case studies sounds contrived, but if you manage people long enough, you will absolutely see examples of each of them. And while the framing might make it sound like I’m trying to convince you that it’s easy to evaluate their performance, the reality is that there’s no one right answer (except for Fran, who clearly has to go). Sometimes quality is critical. Sometimes you just need to get something out the door. Sometimes you need to spend the time to rearchitect. Every industry, company, and project has different requirements.

Having said so, there are a couple of general guidelines. First, the most important unit of productivity is the team, not the individual. Yes, individual velocity provides important context for understanding performance, but the most important thing is how the team as a whole is doing. An engineer who delivers less individually but drives greater team velocity, is creating net value. A team member who individually delivers a lot but costs everyone around them time is engaging in productivity arbitrage and dragging down the team.

Likewise, a project’s “level of effort” isn’t just a measure of how much time it takes for the engineer to write and merge the code. It also includes the time it takes for other people to code review, test, and deploy it; time to respond to customer complaints; to identify, document, and fix bugs; and to rearchitect or retire technical debt over time. Clean, high quality code with strong test coverage takes more time to write, and results in less long-term effort and greater long-term productivity. Low quality code, messy code reviews, bugs, and outages all take time away from delivering customer value. Clear, concise, well thought-out documentation saves time. Long, vague, hard-to-understand documentation costs time and adds mental load.

Specific productivity-related externalities1 are hard to measure, so we generally go with our gut and give everyone mixed messages. Bob is fast, but we don’t trust his deliverables, so we give him a mixed end-of-year review – he walks away with a message that he’s doing great, but he’s also failing. We tell Carol that she needs to document her knowledge, train the rest of the team, and intentionally step back to let other people address crises—but we also keep rewarding her for heroics. Alice and Dara are both seen as well-liked low performers, and when they eventually leave the company we don’t understand why the team suddenly starts getting unlucky, all the time.

You can also see this with AI. People always point to projects where they saved time as proof that it’s improving velocity, but laugh off the cases where it got things wrong, led them down a bad path, or wasted time in prompt carperpetuation. The fast coding anecdata is seen as definitive, the wasted time, hallucinations, and misunderstandings as aberrations. The experienced engineer who’s effectively incorporated AI into their workflow is seen as normative, while slop slingers like Bob are seen as outliers. But all of this is happening to everyone, and it’s magical thinking to believe that either a) all of your engineers are “the good kind”, or even b) that the “good” engineers only see productivity gains from AI.

So what do we do? If we’re only looking at one person, one project, or one sprint, then we’re missing the big picture. If we’re going to be honest, we need to look at longitudinal data. Which is to say, we should be looking at average engineering output across the team or organization over a three to six month period. The longer the better. Don’t count bugs, don’t count outage-related activity—we don’t reward failure. Now, how is your team doing? What have they delivered? Are they making an impact? Are they accelerating or slowing down? It’s hard or impossible to measure engineering productivity over a short time span, but over the long-term trends should start to appear.


  1. In Economics, an “externality” is defined as a side effect of economic activity. So, for instance, a homeowner living near a polluting factory might have a worse quality of life, higher health care costs, a higher chance of cancer, etc. These aren’t the intended results of the factory’s activity, but are nevertheless a natural and direct result. Likewise, an engineer who delivers poor quality code costs everyone around them extra time in code review and bug fixes. ↩︎

Leave a comment