Estimating When Work Starts and Completes for Agile Projects

/Estimating When Work Starts and Completes for Agile Projects

Estimating When Work Starts and Completes for Agile Projects

A friend of mine sent me the following note:

I’m coming across a matter in my research that seems related to your previous interests regarding uncertainty about durations or points in time. To explain by example, let’s say I have a task that is specified as,

“Here’s a task that should take about 5 hours to do. Start on it Tuesday.”

 So, when does the task begin? Assuming normal working hours, it can start as early as 8 am on Tuesday, and as late as 5pm and still be in compliance with the instruction above. So the beginning of the task is in that range. Let’s say we plan to start at noon. 

How do we express the start time of the task? Is it a distribution with a mean of noon, and a 95% confidence interval (or 99%) extending from 8am to 5pm? Or perhaps it’s just a uniform distribution, with all start times equally likely in that range? 

Then, how do we express the finish time of the task? Say we do some estimation and find that our worst case estimate of the duration is 7 hours and best case is 4.5 hours. What’s the distribution of the finish time? Is it a distribution with a mean of 5 and a confidence interval between -0.5 and +2.0? If so, that’s not a normal distribution, but what is it? Certainly we all recognize being late is more likely than being early. 

I’m also wondering if there isn’t perhaps a sort of “interval within an interval” when talking about time. If I ask you to arrive at 4pm then you will likely interpret that there is a range of times that will meet my request, perhaps between 3:50 and 4:05. This isn’t precisely uncertainty, it is an interval implied in the specification of the time. Even if I ask you to arrive at 4:17pm then there is an implied 60 second interval from 4:17:00 to 4:17:59 that would meet my request. Layered on top of this “specification interval” is an interval reflecting the uncertainty about when you will actually arrive based on traffic, weather, other people, etc. Have you considered how this could be expressed as some kind of distribution? 

This could be a red herring, but some quick searching yielded an interesting paper that seems to apply a Weibull distribution to task duration ( –think this is on the right track, or is there a simpler way to deal with it?

This is the answer I gave:

To start, you have three distributions in the story: 

  1. When the work begins, 
  2. The duration of the efforts, 
  3. The finish date. 

When you have 1 and 2, you can apply a Monte Carlo simulation to derive the distribution of the finish date. Based on your story, a good choice for 1. is a triangular distribution, with best case start of day, worse case, end of day, and the expected value at noon.  For 2, one could also use triangular distributions based on the engineers’ best case, likely case, worse assessment of the time to deliver. Alternately, one might use Weibull distributions for 2, but in my opinion, they do not have enough degrees of freedom. To get 3, one adds 2 to 1. To do this arithmetic, one uses Monte Carlo simulation. 

 A more sophisticated approach is to assume the work will start as soon as the the engineer in available (or soon after). In that case, one would base 1. on a more elaborate simulation including the work in the engineers pipeline. We take this approach in the ANDES (aka Delivery Analyst)  prototype being worked in IBM Research.

 Again, any comments? I would especially love to hear about folks experience with Weibull distributions.

By | 2016-12-21T10:13:33+00:00 February 24th, 2014|News|4 Comments

About the Author:

I'm founder and CTO of Aptage. I help Management teams and professionals manage better in the face of uncertainty.


  1. Troy Magennis February 24, 2014 at 9:56 pm - Reply

    Hi Murray,

    Thanks for posting this article.

    I believe software lead times follow the Weibull distribution because of how our work has a base amount of known effort +/- uncertainty, then various combinations of un-related “delays” such as, waiting for another person to test, waiting for more information, queued waiting for release operations, etc.

    These delays have nothing to do with the original task – for example, a test environment being down doesn’t care about the content of the work, these are un-correlated causations.

    Out of every combination of delays an organization has seen, its unlikely that NO delays will occur, and its unlikely EVERY delay would occur; most likely a few will occur to each task. To prove this, if we Monte Carlo base work, with n number of possible delays with p < 1 (say 0.25), then you end up with a Weibull distribution. In looking at real world data, I see that time and time again.

    Further, Weibull as a family of distributions follows the characteristics of work types. For example, a Weibull with a shape parameter of 1.0 is an Exponential distribution. I see this the most prominent in Dev Ops teams where they are at the end of the chain, and the person who pulls the work likely completes the work, with a few small delay types. Then we have development teams where I see Weibulls of shape 1.3 (Kanban) -1.6 (Scrum) where there are more delay opportunities. And finally, Weibull with shape of 2.0 the Rayleigh Distribution which matches the Putnam model for Waterfall development. In essence, I think Agile practices and delivery walks the Weibull distribution from 2.0 on its journey to 1.0 or below.

    I don't model with triangle because:

    1. It ends at the max. I think this isn't real world – i want a distribution with a long tail to infinite but with VERY low probability to match those never-ending delays we may see.

    2. I don't believe that the shape occurs in any real life scenario. I can see in the days before my laptop could perform gene sequencing the mathematics for triangle was easier, but that's not a good reason for using it.

    3. Minimum for project tasks in software projects if asking a developer will always be 1 day! So I stopped asking 🙂

    4. Estimating the mode peak (most likely) is problematic. I find people underestimate this and the triangle distribution fails to acknowledge common mode delays (or the other one, i was a bad student) Looking at historical data or asking a long-timer employee for the worst they have seen uncovers this bias, but would skew forecast if used in a triangle. Enter Log Normal or Weibull….Something with a long tail….

    Here is my technique for Weibull estimating story/PBI/task lead-time –

    1. I the teams for the minimum task time they have ever seen this type of work task (mostly 1 day unless they are estimating a bigger chunk of project).
    2. I the teams for the maximum task time they have ever seen this type of work task, totally worse case, expecting 75-100 days!)
    3. Weibull scale parameter approx. = (high – low) / 4
    4. Weibull shape parameter approx: 1.3-1.6 depending on process. I start with 1.5 and monitor actuals. If they are looking more exponential, reduce, if they are looking more Webull, stay or increase.

    I have a proof and a deck on "Why Weibull" if you want to discuss one day. I'm confident, but still looking for that one data set that disproves my thesis!

    Troy Magennis

  2. […] I posted my previous blog entry on estimating when a task would be complete, I hoped that Troy Magennis would weigh in. He did with […]

    • murrraycantor March 4, 2014 at 2:25 pm - Reply

      Hi Troy,

      In my initial post, I was answering a colleague’s question to a rather narrow question. Our discussion has moved on to the broader question of how would one elicit task-based input to predict project lead time.

      In our research tool, we simply assume the workers start work as soon as they are available. We do account for their availability. We do simulate by cascading tasks based on predicted availability at staff based on skill sets.

      There are interesting differences in techniques. We elicit the inputs in the Hubbard style, and we do set the expectation that their actuals will fall in the 10% – 90% range. We use the nominal (expected) believing that the input contains information that is useful for the simulation. We do this a couple of reasons.

      1. My experience is that developers gut feel has some merit and so should be taken into account (better than maximum entropy assumption)

      2. It’s use starts a constructive conversation with the developers

      That said, we do focus on the range of the distribution, and seek out the reasons for the range. We do this to get elicit their assessment of the sources of uncertainty so that work to reduce the uncertainty can be identified and prioritized. This seems to be in the same spirit of your use of the uniform distribution.

      Of course, we do not does ask for anyone to estimate anything beyond their influence. We do not ask anyone to guess how the whole program will perform.

      Also, as the actuals come in, we use a sort of Bayesian refinement to update the estimates as evidence of project performance accrues. That way we overcome bad initial estimates.

      Finally, we understand and teach that estimate we deliver is the best possible given the current information. We cannot account for information we do not have. However, we do allow for users to override some of the computed variables such as team velocity. They may have some information that is not in the tool.

      PS. Yes, Weibull is a family of distributions. I somehow have missed that the Rayleigh distribution is a special case. Thanks.

      • troymagennis March 4, 2014 at 4:31 pm - Reply

        What you do in the research tool makes great sense. Cascading through staff availability of skills. Do you sensitivity test skill by skill and suggest what skill is in most demand?

        Using the actual data samples to correct and adjust other estimates when real data starts to flow is key. I find the defect rates and “discovered” scope rates the estimates that need most adjustment.

        I get you: as a conversation, your process gets the developers/testers/team to compare their range estimates regularly. Re-calibrating them on how their part of the system performs, learning where their impediments are coming from and how to think about estimating tasks next time. In a sense, they will quickly be estimating the system. I wonder how long before the distribution of historical samples would eliminate the need for them to estimate at all? I’m still of the camp that people should estimate because of what they learn.

        I only mentioned Weibull family to note that our industry pre-predecessors doing this work noted Rayleigh, and now we can map that to what we are seeing in Agile shifting the shape left. To me, this is just another data point to help understand if we are on the right track and “why” cycle-time follows a pattern. Then find ways to leverage that in some way…I have ideas, perhaps – how to quickly identify the most frequent, recent and severe impediments that give true lead-time its long tail.


        PS: Sorry, don’t get to chat much about this stuff with many people 🙂 Hope my comments aren’t sounding critical, i’m just having fun discussing this stuff.

Leave A Comment