Tim Walker avatar

Story Points, Lean Principles and Product Development Flow

A “story point” is a unit or measure which is commonly used to describe the relative “size” of a user story (an agile requirements artifact) when it is being estimated. In many cases, a fibonacci sequence of 0,1,2,3,5,8,13,21,... is used during the estimation process to indicate this relative size. One of the reasons for this is to increase the speed in which an estimation is derived. For example, it’s more difficult to agree on the differences between a 5 and a 6 than it is to agree between a 5 and an 8.

It is very common for agile teams to use story points to describe the “level of effort” to complete a user story. That is to say a user story estimated at 5 points should be expected take 5 times as long as a user story with a point of 1. On the other hand, it is also very common for people to include effort, risk and uncertainty (or complexity) in the definition of a story point. That is an 8 is also 4 times riskier than a 2, 4 times more uncertain.

Mike Cohn has stated that it is a mistake to do this:

“I find too many teams who think that story points should be based on the complexity of the user story or feature rather than the effort to develop it. Such teams often re-label “story points” as “complexity points.” I guess that sounds better. More sophisticated, perhaps. But it's wrong. Story points are not about the complexity of developing a feature; they are about the effort required to develop a feature.”

So what do we do with this?

In Donald Reinertsen’s groundbreaking work “The Principles of Product Development Flow” we get a clear sense about how “batch sizes” affect our product development flow. Without going into much detail, it is fair to say that story points describe the “batch size” of a user story. Web definitions define batch size as: “quantity of product worked on in one process step”.

Reinertsen describes some of the principles of how batch sizes relate to our product development flow. For example: Principle B1 “The Batch Size Queueing Principle: Reducing batch size reduces cycle time”. The cycle time is how long it takes, or, the level of effort to complete the story. Smaller stories equal less effort than large ones. This principle would support the story point as a relative unit of effort as Cohn would suggest.

There are other principles that might not be so supportive of this perspective.

For example, looking forward to the second batch principle: Principle B2 “The Batch Size Variability Principle: Reducing batch size reduces variability in flow” we beging to see some of the challenges in this as a “relative level of effort comparison” only. If we have significantly increased variability in the size of the story, should we still “expect” a “5” to be the same as 5 “1’s”? All we can really expect” is more variability. Increased risk, schedule delays, unknowns are what we should “expect”.

Let’s look at a few more principles.

B3: The Batch Size Feedback Principle: Reducing batch size accelerates feedback.
In our scrum processes, for example, because of B1 (increased cycle time) would mean that the product owner doesn’t see the work product as quickly, and, in some cases, begins to lose faith or worries and increases pressure or interruptions of the team. Fast feedback is cornerstone to agile and product development flow.

Large batches lead to B7: The Psychology Principle of Batch Size: Large batches inherently lower motivation and urgency. We like to see things get done, it makes us happy. As humans it takes us more time to get on with a huge job, but a simple one we might just knock out and get on to the next one.

Very little good comes from large batch size as it relates to product development flow. There are 22 Batch Principles and they are all not too supportive of large batch size.

For example, some of them are:
B4: The Batch Size Risk Principle: Reducing batch size reduces risk.
B5: The Batch Size Overhead Principle: Reducing batch size reduces overhead.
B6: The Batch Size Efficiency Principle: Large batches reduce efficiency.
B8: The Batch Size Slippage Principle: Large batches cause exponential cost and schedule growth.

And so on. Large Stories are bad. Really bad. So, what can we do about this in our agile software development process? The most important thing is that we can not “expect” an “8” to take 4 times as long as a “2”. The “Principles of Product Development Flow” tell us that this is impossible.

We need to understand that, regardless of the size, they’re estimates, and not “exactimates” as The Agile Dad says. We need to accept that our estimate of a larger story is less accurate than our estimate of a smaller one.

We need to understand that, since we can not commit to an unknown, it is unrealistic and violates our lean principle of respect for people to ask them to commit to large stories.

We need to understand how batches affect our queues, our productivity, predictability and flow. We should study and apply product development flow.

We need to understand that we’ll do better in our flow if we have smaller stories, so learning the skills and breaking down stories into smaller sizes will help. We may never allow anything larger than a “5” for example, into a sprint.

With that said, if we have “8”’s in our backlog that (for some reason can’t be broken down into smaller stories) we should compensate in our velocity load estimates for stories in a given sprint. For example: If we have an estimated velocity of 20 and all we have are stories that have been estimated as “8”, we might want to schedule only 2 of these stories in the sprint, leaving us some capacity margin for safety.

One thing to note, however, is that, smaller stories might not be the best solution for geographically disparate teams, according to B17: The Proximity Principle: Proximity enables small batch sizes. Where it might be more effective to have the remote team working on a larger story. They might break it into smaller stories for efficiency. We need to understand the economics of batch handoff sizes and balance our efforts accordingly through reflection and adaptation.

Managing Non-Functional Requirements in SAFe

 Managing Non Functional Requirements in SAFeManaging non-functional requirements (NFR’s) in software development has always been a challenge. These “system capabilities”, such as ‘how fast a page loads’, ‘how many concurrent users the system can sustain’ or ‘how vulnerable to denial-of-server attacks can we be", traditionally have been ascribed as belonging to “quadrant four of the agile testing quadrants” of Brian Marick. That is, these are tests that are technology facing and which critique the product. That said, it has never been clear *why* this is so as this information  can be critical for the business to clearly understand.

In the Scaled Agile Framework (SAFe) NFR’s are represented as a symbol bolted to the bottom of the various backlogs in the system. This indicates that they apply to all of the other stories in the backlog. One of the challenges of managing them lies in at least one aspect of our testing strategies: When do we accept them if they represent a "constant" or "persistent constraint" on all the rest of the requirements?

This paper advances an approach to handling NFR’s in SAFe  which promotes the concept that NFRs are more valuable when considered as first class objects in our business facing testing and dialogs. It suggests that the business would be highly interested in knowing, for example, how many concurrent users the system can sustain on-line.  If you're not sure about this just ask the business people around the healthcare.gov project! One outcome of this approach is that we see a process emerge that reduces or need to treat them as a special class of requirements at all.

If we expose the NFR’s to the business, in a language and manner that would create shared understanding of them, we could avoid surprises while solving a major challenge.

Please consider the following Gherkin example:

Feature: Online performance

In order to ensure a positive customer experience while on our website

I’d like acceptable performance and reliability

So that the site visitor will not lose interest or valuable time

Scenario: Maximum concurrent signed-in user page response times

  • Given there are 1,000 people logged on
  • When they navigate to random pages on the site
  • Then no response should take longer than 4 seconds

Scenario: Maximum concurrent signed-in user error responses

  • Given there are 1,000 people logged on
  • When they navigate to random pages on the site for 15 minutes
  • Then all pages are viewed without any errors

These are pretty straight-forward and easy to understand test scenarios. If they were managed like any other feature in the system the creation, elaboration and implementation of them would serve as a ‘forcing function’  where derived value in the form of shared understanding between the business and the development would be gained. As well these directly executable specifications could be automated such that they could run against every build of the software. This fast feedback is very important to development flow. If we check in a change, perhaps a configuration parameter, or new library, that broke any NFR, we’d know immediately what changed (and where to go look!).  Something that is also very valuable (and often overlooked!) is that each build serves as a critical on-going baseline for comparison of performance and other system capabilities.

Any NFR expressed in this fashion becomes a form of negotiation. It makes visible economic trade-off possibilities that might not otherwise be well understood by the business. For example, if push came to shove, would there still be business value if, under sustained load, page responses were sometimes reduced to 5 seconds in some cases?

Another benefit of writing the test first is that it would increase the dialog about *how* we will implement the NFR scenario helps to ensure, by definition, that a "testable design" emerges.

This approach to requirements/test management is known as "Behavior Driven Development" (BDD) and "Specification By Example". The question of how and when to implement these stories in the flow sequence remains a challenge and the remainder of this article addresses this challenge directly. I’ll address one solution in SAFe.

The recommendation is to Implement the NFR an an executable requirement using natural language tools like Cucumber, SpecFlow (which supports Gherkin) or Fit/FitNesse (which uses natural language and tables) as soon as they are accepted as NFRs in an iteration as part of the architectural flow. Create a Feature in the Program backlog that describes implementation of the actual NFR (load, capacity, security etc.) and treat it like any other feature that point. Have the system team discuss, describe and build the architectural runway to drive the construction of the systems that will support the testing of them. Use the stories as acceptance against the architectural runway, if that is appropriate. If you do not implement the actual test itself right away (not recommended) at least wire it up to result in a “Pending” test failure (not really recommended but I’ll describe that more in a moment). When the Scenarios are running in your continuous integration (CI) environment, the story can be accepted. With regards to your CI, keep in mind that some of these tests, with large data sets or with long up time requirements will take a while to complete so it is very important to separate them from your fast failing unit tests.

The next important step is to make these tests visible to the business and to the development team. To the business, one way to make them visible, along with your other customer facing acceptance tests, is you use a tool like Relish that can publish them, along with markup and images as well as navigation and search.

Another recommendation in this approach would be to build a “quality” dashboard using the testing quadrants as described earlier. That is, each quadrant would report a pass/fail/pending status that could be used for governance and management of the system. When all quadrants are green, you can release. You can get quite creative with this approach and use external data sources, such as Sonar and Cast (coverage and code quality tools, respectively) and even integrate with Q3 exploratory testing results, for example. There is work to be done in this area. Hopefully someone will write a Jenkins plugin or add this to a process management tool.

Using this approach you will always know what the status of your NFR’s are and get the information you need in a timely fashion, when there is still time to react. This approach would help to eliminate surprises and remove the need for a major (unknown cost) effort at the end of your development cycle. In the case above, even if these tests had been marked “Pending” you’d have the knowledge that the status of these NFR’s were unknown, which would increase trust and share the responsibility across the entire value stream.

Learn more about the Scaled Agile Framework: download SAFe Foundations.

Learn more about our Scaled Agile Framework Training and Certification Classes.