Over on projectmanagement.com, my article “Agile: What’s in it for the Project Manager?” has been posted in two installments: part 1 on gathering requirements and work breakdown, and part 2 on interpreting requirements and tracking progress. Projectmanagement.com requires free registration to access the full content.
Experimentation is a powerful learning tool. When I was young, I performed scientific experiments by mixing chemicals together to see what they would do. I learned that most random concoctions from my chemistry set would make a brown liquid that was often hard to clean out of a test tube. I learned that sometimes they would create very smelly brown liquids. These were not really experiments, however, and I didn’t really learn from them. Instead, these were activities and I collected anecdotes and experiences from them.
The scientific method rests on the performance of experiments to confirm or deny a proposed hypothesis. Unless you can propose a hypothesis in advance, you cannot design an experiment to test it. Until you test the hypothesis, you haven’t really learned anything.
“In general, we look for a new law by the following process: First we guess it; then we compute the consequences of the guess to see what would be implied if this law that we guessed is right; then we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment, it is wrong. In that simple statement is the key to science. It does not make any difference how beautiful your guess is, it does not make any difference how smart you are, who made the guess, or what his name is — if it disagrees with experiment, it is wrong.” — Richard Feynman, The Character of Physical Law
When we estimate how long it will take, or how much it will cost, to implement a desired amount of software functionality, we create a hypothesis that we can test. Our hypothesis may not be of enduring and universal value as a hypothesis that predicting physical laws, but it may still be extremely valuable to us.
For example, suppose we have a number of features we’d like to get into our next software release. And suppose we have a date in mind for that release, and a team ready to work on it. We could then ask that team to estimate the features relative to each other, bucketing them into groups of similar sizes. We could also ask them to estimate how much bigger (or smaller) are the features in one size group than the features of another. If this team has previous experience working together, they might be able to guess how long one feature might take to implement. Otherwise, they might just take a guess at it.
I would expect these numbers to be simple, with only one or two significant digits. After all, we don’t have much data to base them on. Their precision should not pretend that we do.
If we were practicing a plan-driven serial software development cycle, we might treat these estimates as promises and try to manage the work to meet them. In such case, I would expect them to have padding for the unknown, and higher precision to hide the fact that they’re padded guesses.
Using an empirical software development approach, we’ll instead treat this projection as a first hypothesis. When we finish the first feature, we’ll have some better data on the rate at which we’re progressing, and can project into the future with a bit more confidence. Does this data confirm our hypothesis of when we’ll be done?
This experiment helps us make decisions. If completing the features by the target date looks unlikely, we’ll want to take drastic action. Perhaps we’ll eliminate some features, or make them all simpler, in order to trim scope and achieve some success. Perhaps we’ll decide to cancel the project altogether, cutting our losses with only a fraction of the budget spent.
If the target still looks feasible, we can continue the experiment. We’ll still have uncertainty about both the rate of progress and the size of the work, but we can reason about those uncertainties. Are our errors in sizing likely to be additive, or random? Is the current rate of progress sustainable? Is it depressed because of one-time startup work? Or is it optimistic because we’ve been cutting corners?
Poorly handled estimation is a means to fool ourselves, but handled with care, it gives us tools to experiment and learn.
There have been some web posts and twitter comments lately that suggest some people have a very narrow view of what techniques constitute an estimate. I take a larger view, that any projection of human work into the future is necessarily an approximation, and therefore an estimate.
I often tell people that the abbreviation of “estimate” is “guess.” I do this to remind people that they’re just estimates, not data. When observations and estimates disagree, you’d be prudent to trust the observations. When you don’t yet have any confirming or disproving observations, you should think about how much trust you put into the estimate. And think about how much risk you have if the estimate does not predict reality.
This does not mean, however, that you have to estimate by guessing. There are lots of ways to make an estimate more trustworthy.
Using more people to independently estimate is one common technique and provides a reasonableness check on the result. Wideband delphi techniques further this by then re-estimating until the predictions converge (or stalemate). People have widely adapted James Grenning’s “planning poker” to perform this procedure. In theory, having multiple independent estimates misses fewer important points and gives us a more trustworthy result.
In practice, the various estimates are often less independent than we think. A group that works closely together can often guess what each other are thinking about the kind of work they commonly do. In addition, many times some of the participants telegraph their estimates before others have decided, soiling the independence. A further problem is that variations in skills and abilities give some people an advantage in estimating work aligned to their strengths, but the estimates of those more ignorant in the work are often given equal weight, skewing the results. This is especially true when estimating things that have been broken down to small amounts of work.
Estimating relative to other work is easier for people, and therefore more reliable than estimating in absolute terms. I can look at two similar rocks and guess which one is heavier, or if they’re about the same, without knowing what either one weighs. This is the genesis of “story points.” Once we’ve assigned a value to one piece of work, then we can estimate others as multiples or fractions of that reference. Using affinity grouping, we can gather together all the work items that seem about the same size.
Unfortunately, we often have a harder time seeing the size of development project work than we do of rocks. Using the rock metaphor, we might be trying to compare a chunk of talc with a piece of uranium ore. Apparent size is sometimes deceiving. People also have a tendency to hold onto absolute references. They want their story points to be comparable from team to team, or from year to year. They want to adjust their estimates after the fact so that items that took about the same amount of time are given similar values. “We estimated that as a 2 but it turned out to be a 5.” They try to fix the story points to an absolute time or work reference, and in the process they make them less trustworthy by damaging the reliance on relative estimation.
Estimating based on recent history is an excellent way to improve the reliability of estimates, especially for the short term. The XP practice of Yesterday’s Weather is one example of this. “If we completed 24 story points last iteration, we’ll probably complete about 24 story points this iteration.” Bob Payne and I took a look at some data we had from teams with whom we’d worked and found that we could generally do as well, or better, by just counting the stories instead of estimating them in points. In other words, saying “If we completed 8 stories last iteration so we’ll probably complete about 8 stories this iteration” had about the same predictive power as using story points, and was a lot quicker to calculate. This was true even when the story estimates varied by about an order of magnitude. Others, such as Vasco Duarte, have noticed the same phenomena. Taking the story points out of the equation seems to remove some of the noise in the data, and certainly removes some of the effort required. If you want to get better, use what I call the Abbreviated Fibonnaci Series which has the values of “1″ and “too big.” Split the stories considered too big. You’ll accrue benefits beyond better estimates.
If velocity gives us a frequency measurement in stories per iteration, then it’s inverse is cycle time. Cycle time is the time it takes to complete one story–equivalent to a wavelength measurement. Once a team has some track record, then you can generally expect the these numbers to settle down into something fairly predictable. Because these estimates are based on data, many people are tempted to treat them as data, themselves. Remember, though, the disclaimer of investment managers, “Past performance is no guarantee of future results.” Even if the team has a consistent track record, there may be a black swan or three right around the corner.
Of course, all things are not always equal. Organizations have a distressing tendency to change the makeup of teams, which changes the rate at which the team accomplishes work. The work itself may change, and so may the team’s skill at dealing with the work.
This is just three categories for improving the trustworthiness of estimation. There are many other techniques for estimating. Most have advantages, and all have disadvantages. Even with our best attempts at improving estimates, the true goal is accomplishing the work. Ultimately it’s better to apply energy to that goal rather than chasing after ever better estimation.
In business, we’re often asked for estimates with too little context to understand the request. When that happens, we’re likely to expect the worst–that our estimate will be treated as a “guarantee not to exceed” and we’ll likely be in trouble at some time in the future. Of course we think that; we’ve been burned too many times in the past. Our fear of the consequences will encourage us to spend far too much time and effort trying to get the estimate “right” so we won’t be blamed.
If an estimate is really an estimate, then we know that it’s “wrong” in the sense that the subsequent actual reality is unlikely to equal it. The estimate is a guess, perhaps an educated guess, predicting the future. Predictions are hard, especially about the future.
Given these problems with estimates, why do we bother to make them at all?
We do so because we have a decision to make, and looking into the future helps us make that decision.
If we’re trying to decide which of two development efforts to pursue next, we’ll want to estimate the value and cost of each of them. Since both value and cost depend on the amount of time we spend doing the work, we’ll want to estimate the time and effort required.
If we’re trying to decide whether a project is worth doing, we want to estimate whether the value is greater than the cost. We may not care what are the actual value and cost, just that it will be profitable.
If project A is clearly more valuable and less cost than project B, it’s clear which to work on, assuming it’s also clearly more valuable than it will cost. But what if it’s not so clear? If project A and B are estimated as about the same value and about the same cost, which would you choose? Unless you have some guideline other than the estimates, it doesn’t matter. If project A will provide value that’s only slightly more than the cost, then we shouldn’t do it at all.
These are just two simple examples. There are many different reasons that a business will want to make forward-looking estimates. There are some caveats with estimating, of course.
Remember that these are JUST ESTIMATES.
We should not trust our estimates too much. Estimates are not data. Even if we’ve done well at predicting the future in the past, we might get this one totally wrong. We should expect our estimates to be off a bit, one way or the other, from what reality shows later. So if our decision is a close call, we should consider what we’d do if we had no estimate, or if the estimate came out differently. Only when the estimate clearly favors one choice over another should we trust it. Even then, we should validate our decision with real data as we go along.
Estimate to an appropriate level of precision
Since we know that our estimates are not data about future events, we should consider the amount of precision we need to make our decision. There’s rarely a need to estimate project completion to the day, or project cost to the dollar. If we think a project is worthwhile if it costs $100,000 but not if it costs $200,000, then we don’t need to estimate with any more precision than to the nearest $50,000. Using greater precision than we need increases the cost of the estimate without increasing its value for helping us decide.
Using an appropriate level of precision will also help us remember that these are estimates, not trustworthy data. There’s also some evidence that precision interferes with moving toward our goals.
Continually check and refine your estimates.
Estimation should not be a phase, that you do once and then hold onto it. Estimates give you a model to judge your progress against your expectations. It allows us to track our investment toward our goal. The actual progress acts as a check on our estimate, potentially giving us early warning of future disappointment–early enough to do something about it.
In the early 2000s, it was often said that “you don’t need to do design in Agile software development.” A lot of people were very dismissive of Agile software development because of comments like that. In truth, you do need to do design. You just don’t need to do it all up front. You need to do enough to see where you’re headed, and continue to question it and refine it as you learn more about what you need from the design. The same applies to estimation. We project into the future given what we know, track our progress, and make new projections as we learn more.
Don’t make this a big job, though. They’re still “just estimates.” Do a little at a time, but look at them frequently and regularly. Cross-check with other ways of estimating. Estimates are tentative; don’t put your full weight on them.
When they’re not agreeing with the subsequent reality, don’t blame the reality. Estimates give you a map; they’re not the territory. When these disagree, trust the reality. Does our decision still make sense? Would we decide differently, given what we know now? Are there possibilities that we didn’t consider when we made our first estimates? Should we change our decision, given where we are and what we know?
The estimates are not the goal, they’re just a tool. When we find they’re “wrong,” there’s no sense blaming the estimates or those who made them. They’re now a historical record of our previous thoughts and knowledge. Instead, use them as they’re meant to be used–as a tool for making decisions. And make new decisions.
Suppose you have a number of products, or a number of applications, that share some common functional needs. It seems obviously reasonable to create a separate team to build those functions in common. Often these grow to become known as a framework, and the product or application teams are expected to use it.
It’s a seductive concept, but don’t do it. Why not? I can think of several reasons.
The first is that good frameworks aren’t built before being used. Instead, they’re extracted from successful products or applications. For anything of size, there are bound to be subtleties that you won’t get right until you try it for real. If you’re building a framework first, you’re delaying that first real trial. If the team using the framework is different from the one building it (as things are usually arranged) there’s significant delay for improvements needed by actual use. And that assumes the application team knows that the framework could and would be modified to accommodate them. Most often, they’ll just live with the problems.
Another reason is that your framework will never justify its cost as an internal product. When frameworks are built for sale, they have a large customer base depending on them, or they go off the market. Internal frameworks do not have that large customer base, and can never pay back enough to cover the inefficiencies of working that way. Instead, you’ll have taken the best and brightest of your developers for a framework that can’t turn a profit, starving your actual product of talent.
If you want to build an internal framework, build it from actual use. Put those best and brightest developers on the front lines of customer need, where they can do the most good. Distribute them among your application teams. Have them talk with each other, determining the common needs, and extract the framework as they go. They can act as Component Stewards, guiding the framework development and also guiding the use of it. They can then be in a position to know when the application should adjust to the framework, or the framework should accommodate the application.
Adding a new team member to an existing team always introduces challenges. The introduction changes the makeup of the team, and if the team had jelled, it has to do so, again, with the new member.
Also, the new member has to learn about the team and its work. There are many tacit assumptions held within a team. It’s impossible to document them all and, even if you could, both reading such a document and keeping it up to date are daunting herculean tasks.
First, they brainstormed a list of topics that a new team member needed to know. It included things like how they used the story wall, who had what role on the team, the architecture of the system, the team working agreements, and the local Agile practices. These topics were written on index cards, one to a card. When a new team member came on board, they setup a section of wall with Backlog, In Process, and Done columns, and put the index cards in the Backlog column, in a rough approximation of the order to learn them.
Existing team members put post-it notes with their names on the cards they were prepared to help with. It was the new team member’s responsibility to work through these cards, one or a few at a time. The would take a card, put it in the In Process column, and ask the person named on the post-it to help them learn whatever the card mentioned. Sometimes this took a few minutes, and sometimes it took several days to go over the topic on a card. As each card was completed, it was moved to the Done column.
The very last card was “update the new member backlog cards.” Since the newest member had just gone through the process, they were in the best position to update the deck, adding, removing, and reordering cards as appropriate. This put the deck in the best possible shape for the next new member, while the memories were still fresh.