If you've ever managed any software project, you've probably asked yourself: how could our teams move faster? How fast are we moving today?
For these kinds of questions, it's tempting to turn to metrics. After all, we use metrics routinely and successfully when we develop software. There's performance, production load, and uptime metrics. There are also metrics based on customer behavior, like conversion and retention. These metrics don't just provide visibility. More importantly, they create a feedback loop. We can make a change that is intended to improve something, and use metrics to see how much improvement there really was. Developer wisdom says that every software performance optimization must start with a measure, which makes perfect sense.
Since metrics are so helpful, can't we apply them to software development speed? As better development processes should improve development output, maybe output metrics could measure whether processes really improved. So then, what metrics could we use?
Development speed is the amount of work produced per unit of time, so we need to measure both output and time. Measuring time is simple, no problem there. What about work output? Attempts to measure it are as old as software itself. Through the years one thing's held true: whenever we decide a metric measures work output, something unintended soon follows. Consider the following examples:
Developers are smart. They specialize in cracking complex problems. For any metric you give them, they'll find the easiest way to improve on it. This way most likely won't correlate with work quality or desired project outcome. This doesn't necessarily mean that developers will game them. I'd say it depends on the context and how strong the incentives are. But one thing is certain — developers will realize that the measure of their productivity is disconnected from what matters. This is not only frustrating. It also distracts them from doing the real work.
Why do metrics work so well for software products, and not for measuring developer output? Is it some kind of developer conspiracy? Actually, if you look outside of software development, you'll find more examples where metrics work well, and where they don't.
Where metrics work well: manufacturing and sales. Let's take the manufacture and sale of cups. You can measure production output, i.e. no. of cups produced per unit of time, and production quality — percentage of cups that failed to pass quality control. On the sales side, you can measure sales volume and profit margin. These metrics are really helpful for management. For example, the goal for the manufacturing department can be to improve the percentage of cups that successfully pass quality control, while keeping unit costs low. The head of sales can aim to increase sales volume or improve profit margins. Improvements in these metrics are good for business, so we can also treat them as a measure of efficiency of corresponding departments.
Where metrics don't work: measuring scientific output. Scientists publish their research in articles. Now science also has some metrics that measure work output and quality: no. of articles published, no. of citations, statistical significance of the results. Can we say that a scientist who published 10 articles has produced twice as much value as a scientist who published 5? Unlikely. Works differ too much in their value. Even without numbers, it's often hard to say which work is more valuable. Because "hacking" one's number of publications and citations is a well-known problem in the scientific community, they're not considered reliable indicators of work productivity. Statistical significance has its own issues too — p-hacking is a wide-spread problem.
In any context, metrics that work well share two important criteria:
Let's take a look at the examples above:
Metrics in manufacturing and sales satisfy both criteria. In cups manufacturing, value is represented in cups. Since it's mass produced, cups are identical. In sales, value is measured in dollars. The business goal is to make money, so money's relationship with business value is as direct as possible. As one dollar is equal to any other, metrics based on money measure consistent values.
In science, these criteria aren't satisfied. We can't find a metric that would measure the value of scientific results directly. We have only indirect metrics, like no. of articles and citations, which can be hacked. In any case, these metrics aren't consistent either, because all publications are not made of interchangeable units.
What do we have to measure developer output? Lines of code, no. of commits, no. of tasks completed, man-hours, storypoints… Considering these metrics against the two key criteria above, you'll find that:
It's no surprise that none of these metrics work well.
Why don't we have metrics for developers that would relate directly to value? For the same reason we don't have any for scientists. Developers, just like scientists, always create something new. They don't write the same code again and again — that wouldn't make sense. Code can be re-used in a variety of ways — you can either extract it to a module or a library, or, as a last resort, just copy and paste it. Every developers' workday is unique. Even if they solve similar problems, they solve them in a different context, or a new way, each time.
Why don't we have metrics for developers that would relate directly to value? For the same reason we don't have any for scientists. Developers, just like scientists, always create something new. They don't write the same code again and again — that wouldn't make sense. Code can be re-used in a variety of ways — you can either extract it to a module or a library, or, as a last resort, just copy and paste it. Every developers' workday is unique. Even if they solve similar problems, they solve them in a different context, or a new way, each time.
Of course, no one today seriously talks about measuring developers' output in lines of code. There should be something more modern, right?
The book Accelerate from 2018 presents research on 2000 organizations of different sizes. The research goal was to identify which metrics would differentiate high performers from low. Here's what they found:
Source: Nicole Forsgren, Jez Humble, and Gene Kim, "Measuring Performance," in Accelerate: The Science behind DevOps: Building and Scaling High Performing Technology Organizations
We can see four metrics here. Let's see how these metrics relate to value, and whether they're consistent:
Bottom line — all four metrics are inconsistent, and they don't always directly relate to value. Are they prone to gaming? Sure — just ship trivial changes as frequently as possible, and all metrics except Lead Time will look great.
As for Lead Time, even if we ignored the (important) fact that it's inconsistent, setting this metric as a goal would lead to prioritizing the simplest customer requests, and ignoring everything that customers didn't ask for. This category typically includes refactoring, tests, and all the improvements customers hadn't thought about.
That's why I wouldn't recommend using these metrics as development goals.
You can say — wait, just because good metrics haven't been found yet doesn't mean they can't be found at all! People are smart, they'll find something new that'll work better. Well, I'm afraid they won't. There's a fundamental reason why we don't have good metrics for developer performance. Good metrics would satisfy the two key criteria:
We can't measure developers' output directly, because their results are always different. Each task and project has unique requirements, so there are no repeating results. Without repeating results, we just don't have a reliable foundation for measurements. All that we have is indirect metrics, which don't always correlate with value, and are prone to gaming. Using them as goals ends up causing more harm than profit.
Metrics are convenient because they provide a feedback loop — you learn whether your changes improve something. If you don't have metrics, the feedback loop is not so straightforward. Sometimes you may even feel that you're going blind. There's a famous saying attributed to Peter Drucker:
If you can't measure it, you can't manage it.
This isn't true though. According to the Drucker Institute, Peter Drucker didn't have the illusion that it's possible to find a metric for everything. In fact he never actually said that. Not everything that matters can be measured and not everything that's measured matters.
Not having good metrics doesn't mean we can't improve development speed. Some companies definitely build software faster than others, without dropping quality, so improvements must be possible.
You can and should improve your software product with metrics. Performance metrics, like latency or CPU load, reliability metrics, like uptime, and user behavioral metrics, like conversion or retention, are your friends.
However, you shouldn't rely on metrics when trying to boost development speed, because there are no good metrics for that. We can measure lots of things, but unfortunately, everything that we can measure doesn't directly relate to value, or doesn't have consistent enough values, or both. If you set goals based on such metrics, nothing good happens.
But don't worry — there's hope! If we don't have good metrics for development speed, that doesn't mean we can't go faster. We definitely can. One of the most important things that can help us develop faster, is improving communication between developers and managers. At the link above we'll talk about why this is important and provide some concrete examples of what can be improved and how.