Multi-team estimate calibration

In a perfect world, the measure of a user story point would be universal.

There’d been exact formula for summarising the effort required in the delivery of a feature. All teams everywhere would have the same number of members and the same iteration length.

There’d be a world record for velocity, and a software development Olympics every four years. Software development proficiency world champions would be famous, adored by millions of kids longing to be software developers.

Actually, we might leave that to Lebron James, it’s not actually sounding that perfect really.

Fact is, we DO have a perfect world. One with diversity and complexity, personality, cultural nuance and attitude. We are all different, and our teams are too. That said, it ought to be possible to at least have a consistent and comparable story point unit within one company right? I think so, and here’s what we’ve done to achieve it.

The reason to do this is to give a consistent currency to the business. Inevitably, the performance of the teams is compared, and if the developers offer story points as their measure, then that is what will be compared (painful discussions surrounding the futility and wasted energy of doing so aside) by the casual observer.

The process began with an invitation. An e-mail invitation to every developer test and designer we have.

Our teams are small, but they had diverged from what was (essentially) one bigger team, that had been growing and evolving. We wanted to maintain consistency across all teams, in relation to the size of a three-point story, five-point story etc, as we would now be estimating separately.

The invitation was for each team member to simply find and submit their perfect user story, at every level of the point scale. “In your opinion, what is our perfect all-time eight-point story”.

I asked that this be done without discussion, and let it sit for a week or so. By the time the results are in, we had more than a dozen submissions. Each one had a reference to the user story itself, and a very brief justification for its selection.

Next, we put a couple of hours aside to kick these around. The goal was to discuss each suggestion, and come to a consensus on a set of “poster child” user stories.

Sometimes it was easy. In fact, from the hundreds of user stories in our history, it was surprising to observe that two or three featured multiple times in the suggestions. Others were more difficult and after some debate we took a vote.

The result was a set of stories that now comprise an “estimation guide document”. Following our meeting, all attendees took the set of stories and again made brief comments, this time summarising the characteristics of the story that they thought maybe it such a good example.

All of this data, edited to be as concise as possible, lives in our wiki and as posters on the wall in the team room, for reference during estimation.

For now, it’s working well, but if I know anything for sure, it’s that we’re going to need to do this again and again over time, and maintain a discipline in referencing (and updating) the old stories. Time will tell if it’s worth the effort.


+ There are no comments

Add yours

Leave a Reply