A friend of mine sent me the following note:
I’m coming across a matter in my research that seems related to your previous interests regarding uncertainty about durations or points in time. To explain by example, let’s say I have a task that is specified as,
“Here’s a task that should take about 5 hours to do. Start on it Tuesday.”
So, when does the task begin? Assuming normal working hours, it can start as early as 8 am on Tuesday, and as late as 5pm and still be in compliance with the instruction above. So the beginning of the task is in that range. Let’s say we plan to start at noon.
How do we express the start time of the task? Is it a distribution with a mean of noon, and a 95% confidence interval (or 99%) extending from 8am to 5pm? Or perhaps it’s just a uniform distribution, with all start times equally likely in that range?
Then, how do we express the finish time of the task? Say we do some estimation and find that our worst case estimate of the duration is 7 hours and best case is 4.5 hours. What’s the distribution of the finish time? Is it a distribution with a mean of 5 and a confidence interval between -0.5 and +2.0? If so, that’s not a normal distribution, but what is it? Certainly we all recognize being late is more likely than being early.
I’m also wondering if there isn’t perhaps a sort of “interval within an interval” when talking about time. If I ask you to arrive at 4pm then you will likely interpret that there is a range of times that will meet my request, perhaps between 3:50 and 4:05. This isn’t precisely uncertainty, it is an interval implied in the specification of the time. Even if I ask you to arrive at 4:17pm then there is an implied 60 second interval from 4:17:00 to 4:17:59 that would meet my request. Layered on top of this “specification interval” is an interval reflecting the uncertainty about when you will actually arrive based on traffic, weather, other people, etc. Have you considered how this could be expressed as some kind of distribution?
This could be a red herring, but some quick searching yielded an interesting paper that seems to apply a Weibull distribution to task duration (http://digitalcommons.wayne.edu/cgi/viewcontent.cgi?article=1502&context=jmasm) –think this is on the right track, or is there a simpler way to deal with it?
This is the answer I gave:
To start, you have three distributions in the story:
- When the work begins,
- The duration of the efforts,
- The finish date.
When you have 1 and 2, you can apply a Monte Carlo simulation to derive the distribution of the finish date. Based on your story, a good choice for 1. is a triangular distribution, with best case start of day, worse case, end of day, and the expected value at noon. For 2, one could also use triangular distributions based on the engineers’ best case, likely case, worse assessment of the time to deliver. Alternately, one might use Weibull distributions for 2, but in my opinion, they do not have enough degrees of freedom. To get 3, one adds 2 to 1. To do this arithmetic, one uses Monte Carlo simulation.
A more sophisticated approach is to assume the work will start as soon as the the engineer in available (or soon after). In that case, one would base 1. on a more elaborate simulation including the work in the engineers pipeline. We take this approach in the ANDES (aka Delivery Analyst) prototype being worked in IBM Research.
Again, any comments? I would especially love to hear about folks experience with Weibull distributions.