A few weeks back I wrote about an AMR Research note on the significant problem HP encountered with an ERP migration; the problems had a material impact on HP's financial results last quarter. This week, Computerworld has an interview with Giles Bouchard on what went wrong with the migration. It's a decent interview, and there are some things to which I would like to draw attention.
The first thing is that the ERP migration that caused the problem was the 35th such migration HP had performed as part of the broader project. The prior 34 obviously went well enough to not cause a major disruption to the business (however, we don't know exactly how smoothly, given that Bouchard also says they planned on three weeks of significant business disruption for this migration). Regardless, the key take-away is that, as they say, shit happens, and even with a proven methodology and approach, it is still critical to plan for contigencies.
One cynical observation I would make about the interview reminded me of a former client, a CIO at a global medtech company, who remarked to me (to paraphrase) "Some of these CIOs I read about must be masterful politicians. You read about projects that completely explode and there's no accountability anywhere in the IT organization." Bouchard certainly could be accused of this. Consider the following exchange:
Computerworld: What went wrong?
Bouchard: It's a lot of little things that added up. Not one or two of them would have created the problem, but the combination of them created a bigger problem than we expected. We had planned for three weeks of disruption, and there were six weeks of disruption.
These problems are in three major categories. One we call working across silos. The team that was driving this [consolidation] program had to work with other parts of the company -- they were dependent on other parts. And working across these seams proved difficult. People have different priorities, so it made program management a little more complex. For example, logistics was in a different group, and the front-end order-taking was in a different system, hosted in a different group. So there were a lot of these little disconnects.
Secondly, there were a lot of data integrity [problems]. Orders fell out between the legacy front-end system and SAP on the back end, which required a lot of manual intervention. Orders started falling out for a lot of different reasons -- training, [product data management] issues, all kinds of little things. So backlogged orders started doubling, and by the time people were finished fixing those issues, we had a fairly large backlog, which took several weeks to get rid of. The backlog was not resolved until the end of the quarter, which had an impact on our financials.
The third element was increased demand. This migration had to do with our industry-standard Intel-based server business. The demand really increased for those products, and it's still very high right now, which is good news. But it put even more pressure on the whole system.
You might have missed it. Did you notice that there were "a lot of little" problems that added up? Did you also notice that the third item isn't really a problem if the system worked? It also doesn't appear that there is any quantification of the the impact to the "working across silos" problem. It made it difficult? OK, but really, what does that mean, given that work is often difficult.
What if we look at the second problem? This was the problem to which Bouchard attributes the financial impact on HP. Orders "fell out" between the SAP & legacy system. It kind of makes it sound like it's the order's fault; these deviant orders just jumping out of line to play hooky. Bad order, bad.
Seriously, this is the problem that was the problem, and I will tell you that when System A & System B can't talk, it is a really, really hard thing to pin on the business, as integration work is almost purely a technical endeavor. (Although business users should understand it. See my previous note on "Things to Think about with Integration Projects.")
From a purely technical standpoint, integration projects may be challenging, but the upshot for developers is that the requirements are crystal clear and not really subject to change, except under certain circumstances, such as a new application version being released. If a technology project is failing because of system interfaces or integration, 99.5% of the time it will be the technical staff's responsibility.
Now, fast forward to the very last question of the article:
Computerworld: So give us three bullet points on lessons learned and what other CIOs need to do to avoid the problems you encountered.
Bouchard: The big one would be work across silos and watch for the seams between organizations when you consolidate. Your system scope [needs to be] broader than your organizational scope.
The second point is you need proper program management.
The third one is that perfect storms do happen. We had planned for a three-week disruption and thought maybe two or three things would have to be fixed. Well, more things happened. We should have had a contingency plan for four, five or six weeks. Contingency means you build more inventory, you put more inventory in the channel.
What? Nothing about actually testing system interfaces thoroughly? Nothing at all really on the technical side, except the oblique reference to program management. As the CIO I mentioned above would say: "politically masterful."
Now, having said this, obviously HP's senior management felt the biggest failing was with respect to contingency planning, although the interview seems to indicate that there may have been broader reasons for their terminations than the ERP-related problems. My opinions of Bouchard's diplomacy notwithstanding, I do agree that the business needs to be responsible for forseeable breakdowns of any sort, including those in IT; and they need to have an extended vision of what "forseeable" means.
Charles Roxburgh of McKinsey & Company wrote an article last year on "Hidden Flaws in Strategy." The first common decision-making flaw he identifies is overconfidence. Overconfidence or overoptimism means that worst-case scenarios are usually too rosy, and the odds of encountering negative scenarios are too low. Bouchard alludes to the problem in his interview, things went worse than anyone predicted they could.
Roxburgh outlines three steps to help cure overconfidence, and I would encourage you to apply these to assessing IT project risks.
- Test strategies under a much wider range of scenarios. But don’t give managers a choice of three, as they are likely to play safe and pick the central one. For this reason, the pioneers of scenario planning at Royal Dutch/Shell always insisted on a final choice of two or four options.
- Add 20 to 25 percent more downside to the most pessimistic scenario. Given our optimism, the risk of getting pessimistic scenarios wrong is greater than that of getting the upside wrong. The Lloyd’s of London insurance market-which has learned these lessons the hard, expensive way-makes a point of testing the market’s solvency under a series of extreme disasters, such as two 747 aircraft colliding over central London. Testing the resilience of Lloyd’s to these conditions helped it build its reserves and reinsurance to cope with the September 11 disaster.
- Build more flexibility and options into your strategy to allow the company to scale up or retrench as uncertainties are resolved. Be skeptical of strategies premised on certainty.
Note how even though Lloyd's very severe worst case scenario planning was thankfully sufficient, it all but actualized in New York. They need a new worst case; it really needs to be the inconceivable, the unthinkable.
One possibility that is not mentioned here is how much schedule pressure had to do with the eventual outcome of the project. It surprises me that the system would go into production given the problems with the interfaces. The fact that they expected a certain period of serious business disruption leads me to believe that HP may have been aggressive in the timing of the project, or possibly extremely reticent to extend the schedule to accomodate the project's problems. I have seen business and IT executives both demand that a certain date be met, often for no better reason than that the expectation has been set inside the organization, and it would look "bad" for them if the project was delayed. That is a very, very big gamble to take if the downside is any business discontinuity.
And this brings up another decision-making error Roxburgh identified: misestimating future hedonic states, by which philosophers and economists mean that we aren't very good at predicting what will make us happy or unhappy and for how long. In the situation above, maybe it really wouldn't be so very bad to have to push out the production date.
If you have comments about this topic, suggestions for future topics, or questions related to the governance of the IT function or the business-centric use of technology, feel free to e-mail me at eyetoIT@gmail.com.
Comments