InfoQ now has the talk I did on the Monte Carlo Method at Agile Australia (last year):
Oh no. Another guy who hates estimates?
But look back through the history of this blog and you’ll see a couple of ever-present threads.
- Teams (and a manager) looking for ways to improve their assessment of the time required to deliver a software project
- The idea that while it’s probably impossible, it doesn’t mean you shouldn’t try
There’s great value in the discussions needed to make such an assessment, to forecast a delivery schedule for a software project. To do so, the group must have agreed on how it will record it’s work, what that work is, what value it represents, what options there may be in the delivery of that value, and so on.
Ultimately and broadly though, I think the usefulness of a project forecast boils down to two main things.
- Having enough of an idea of the cost of delivery, in order to weigh that against the expected value derived from the effort
- Providing dependent and interested parties with a chance to organise around the release date
Agile teams have tended towards comparative estimation, based on an idea made popular by Mike Cohn. Comparative estimation done well is a wonderful facilitator of good team conversations, but it’s still an opinion, tied to the team’s analysis of the task at hand. So while it’s a really useful discussion tool, it is common to find within the resulting delivery forecasts those human traits of over optimism and a desire to please the reader. We always think we’ll be able to do it faster than we can and we always know that’s what people want to hear!
An alternative to this approach is one that takes an “external” view of a team’s history and makes a forecast based on probabilistic simulation.
In this post, I’m going to step through such an approach, one that I’ve been using with teams, to build a better picture of the likely delivery timeline of a medium-sized project. The idea uses Takt Time and mathematic Monte Carlo estimation method to determine a probable range of delivery dates.
Takt is a German word for a rhythm or beat. Like that of an orchestra or your heart. Takt Time describes the regularity of that beat, the time in between each. In production line manufacturing, Takt Time is the rate at which each finished item completes manufacture. At Toyota’s Melbourne plant for example, a Takt Time of 7 minutes would mean that a new finished Camry rolls off the line every 7 minutes.
By understanding Takt Time, a manufacturer is able to run the line at a pace that is in line with demand. Of course it also provides an easy way to work out how long it would take to produce large batches of items. You simply multiply the number of items required by the Takt Time.
You can read more about Takt Time and it’s place in the Toyota Production System here.
This concept becomes useful to our method when Takt Time observed as the rate at which a team completes user stories over time. The time between the completion of successive stories. For our purposes, we want to capture this data in distinct segments, stories per week or per sprint say.
In the example below, I’ve diagramed the delivery pattern of one sprint for a fictitious agile team.
Each story has a Takt Time, rounded to one half day. The first, delivered halfway through day 2, of course has a Takt time of 1.5 days. The next, delivered 1 day later, therefore has a Takt Time of 1 day. Note also the two stories delivered on the last day at the same time. We observe the Takt Times in the same way, one as 2 days (after the story before it) and one as Zero days (after the one delivered at the same time).
At the end of the sprint, we make note of the average Takt Time (including the Zero!). In this case, 6 stories over 10 days means the sprint Takt Time is 1.66 days.
You see where we’re going ? In order to forecast projects, we simply multiply the number of user stories in the backlog by the Takt Time, right?
Project Duration = Takt Time x Number of User Stories
Of course we bloody don’t!
The formula is straight forward, and it’s the right one, but how do we account for the uncertainty of the Takt Time? We can’t simply assume that the team will deliver at that rate consistently in the future.
In How to Measure Anything (Finding the Value of Intangibles in Business), Douglas W. Hubbard argues that building ranges of data is an appropriate means of gaining insight for business decisions. He reasons that attempting to produce a detailed, specific prediction of a business outcome prone to uncertain variables, rarely provides information more valuable for the purpose of decision-making. If the revenue of a $100,000 investment was painstakingly modelled to an assumed benefit of $481,907, would you make a different yes/no decision than you would if you had high confidence in an estimated benefit of “somewhere between$ 400,000 and $600,000”?
To build uncertainty into our model, we’ll mathematically simulate realistic project outcomes, to produce a range of likely delivery dates, each with an assumed level of confidence. That way we can discuss the various decisions that might be made in the event of each. Would our decision to proceed be different if we thought this would take exactly 8 sprints as opposed to (“between 6 and 9 sprints”?
The Takt Time we observed for this sprint is just a sample, a typical example for this team. The next sprint will be different and over time, there’ll be the outliers, large and small. Over the course of 10 sprints, our team’s known Takt samples look like this.
But that range is really no help at all in making a forecast, is it?
Monte Carlo (with Boot strapping)
Mathematic modelling helps people in all sorts of fields answer questions about likely real world outcomes. How do they extrapolate the results of a drug trial to a wider population for example? With even a limited observation of Takt Times, the Monte Carlo method (with Bootstrapping) can help us paint the bigger picture.
The method makes use of the real world data we’ve collected to build reasonable, probable future outcomes. Instead of assuming one consistent Takt Time, it builds a range of thousands and uses that to again build a range of probable delivery forecasts.
Let’s take five of the Takt Times from the list above. Any five, at random. These are real examples of the time between delivered stories for our team. The sum of those times 5 times is therefore a realistic simulation of the total time to deliver 5 user stories. By dividing that total by the number of inputs, we derive another reasonably assumed sample point.
You’ll notice the duplicate in the random sampling. It’s completely reasonable that the team delivers two stories with the same Takt Time in a given period. This means that there are thousands of possible samples of 5 from within our collection. We have thereby been able to bootstrap a model that simulates realistic delivery outcomes.
Using a spreadsheet we do just that, assemble a list of 1000 realistic, highly probable Takt Times for the team. This can serve us as a faux history of delivery from which we can derive reasonable project delivery assumptions.
Pulling it all together
The final step is to extend the simulation to a project. This is pretty much a repeat of the randomised selection of Takt Times to produce an acceptable range.
Let’s say our early understanding of the project is that it consists of 35 user stories. We now use the extrapolated range of Takt Times to randomly simulate a delivery time for that number of items. Again, we use a spreadsheet to simulate a 35 user story project, as might be delivered by our team, thousands of times. The resulting distribution is a remarkably interesting and valuable view of delivery possibilities.
This is what we’ve been mining for. An opinion on the likelihood of a certain delivery schedule that does not rely on the team’s opinion of the effort required, but an observation of the probability of that team delivering a given number of user stories in a given period of time.
We can see that (starting from the left) 2% of the simulations indicated completion in 6 sprints. Not a great level of confidence. As we move right though, accumulated results build. 11% of results come in at 7 sprints (or less). This is distribution is our forecast. We reach the 96% level at 13 sprints.
The level of confidence is a dimension missing from most other methods of estimation. Depending on the cost or the risk involved, we can choose to proceed (or not) at any point on the scale.
Alright. I know there’s some detail in what I’ve described above, but really the process so far can be summarised as follows:
- Make an observation of a team’s Takt Time over several sprints
As few as 4-5 is enough to get started, and sprints might be weeks if sprints aren’t your thing
- Use random samples of that data to generate a simulated range
Bootstrap additional probabilistic typical Takt Times (thousands) for the team.
- Sample those simulated Takt times to build a project simulation.
Observe the resulting distribution
I’ve found the Monte Carlo method to be a very useful way of forecasting projects. To close out the post, here are some observations and notes to consider before you try it for yourself.
- It’s a good predictor of the timeline but….
- It relies on a pretty good backlog.
- Stories don’t need to be estimated for size individually but….
- You should be confident that the number of stories is about right
- Teams like it. Monte Carlo removes their ownership (blame/shame) of the forecast but….
- They also like comparing it to what they thought the timeline would be
So you’d like to try the Monte Carlo method for yourself? Well of course Scrumage wouldn’t dream of leaving you high and dry without a helping hand!
Here is the spreadsheet I’ve developed to simulate projects with my teams. It’s a Google sheet with instructions and (as with all my charts) you can make a copy and use it as you please.
I’d love some feedback on the post, the sheet and to hear from you about how it goes!