Monday, September 22, 2008

Lessons from the Financial Crisis

Yes, Grasshopper, the erosion of your stock portfolio holds valuable lessons that you can apply to recover all that money that you have lost. No, this is not a post on investment strategies — although I would encourage you by all means to get advice on that from a reputable source— instead, I want you to learn how the market's ups and downs are similar to vulnerabilities in the applications you develop.

Wall street has often been compared to a "well-oiled machine" but that is an analogy that does not quite match our line of work. As an application, it is not difficult to see the parallels. Just as as in any enterprise application, financial markets have
✎ Users with different levels of access and privileges : traders, directors, investors, etc.
✎ Different types of data such as stocks, bonds, futures, cash.
✎ Operating modes like day trading, overnight activity, etc.
✎ Metrics or indices such as the Dow Jones Industrial Average, NASDAQ, etc


Just like with any application, the amounts and types of data fluctuate. User actions are unpredictable and the metrics are unpredictable. Despite all the alarming headlines, the financial markets have an outstanding record of correcting themselves. As applications go, Wall Street, with its numerous fail-safe mechanisms, has had an outstanding uptime record — name another application whose latest major malfunction was in 1987.

Speaking of high availability, another very real application that has been very much online since its launch has been Google — which, by the way, has made tons of money on Wall Street, but that's a whole other story. Google's vice president of engineering has stated that the secret of their success has been failure.

You will say, but Google is doing everything but failing and you would be wrong. Google is managing failure to it's advantage. When failure is managed it becomes unqualified success. Apart from it's many projects in every area known to humanity, Google is at heart a data management company — OK, a search company — that deals with the challenge of storing and using large amounts of data despite the very real possibilities of disk, software, power or human failure.

Most companies cobble together a data center and pray nothing falls apart. Google, on the other hand, deploys a server farm and waits tools in hand to deal with the first failure to crop up. Their success in the data center stems from a realistic acceptance of Murphy's Law. A very homey and appropriate analogy would be tending to a baby; you know you are going to have to change diapers so why not stock up.

Why isn't this mindset more widespread in the IT community. Why does Google stand alone on an island of reliability surrounded by an ocean mediocrity? In the software world, especially, why do we continue to design and write software as if everything will always work as expected as if everything else in life did?

Based on experiences and observations I have been able to identify four possible reasons:

  1. Backup plans are a tough sell with management.

  2. Lack of foresight.

  3. Laziness.

  4. The belief that we can scam our way out of this one.


Let's examine each and see how we can deal with it.

Backup plans are a tough sell with management
In an environment where managers seem to do everything quarter to quarter, any plan that deals with long term stability is not easy to get funding for. However, before we hurry to heap all the blame on the suits, let's see why we may have failed to get them on board with our disaster recovery plans.

A good place to start is to go outside of the IT department — a very refreshing thing to do; you should try it sometimes. Lets take a look at the carpenter who is building the new bookshelf in the CFO's office. Notice his/her little "disaster recovery plan" expressed as steel reinforcements strategically hidden from view or jumbo size rivets. Notice how the shelves are double layered with extra columnar support to handle the extra weight of numerous hard-bound volumes. Notice the extra layer of lacker being applied so that it will resist casual scratches and retain its luster several CFO's down the road.

Have you ever wondered how come all of these "extras" were just included in the job once it was approved and there was no "selling" involved in getting the bookshelf reinforced? I know you are thinking to yourself, "this vain CFO has money to beautify his office, but is always penny-pinching with the IT department." But I want you to focus instead and how the cabinet maker presented his bid and how you presented yours.

One of the things that executives like about carpenters, interior decorators and the like is that they get one price and one delivery date — which, by the way, is usually met. Do you think, if the carpenter had given an estimate for the bookshelf without reinforcements and then tried to sell those separately, he would have gotten the necessary money? I don't think so.

The problem we have with software projects is that we ourselves are not sold on the need for security, proper testing, disaster recovery, and such. We tack these onto our project proposals at the last minute and the ambivalence with which we present them tell the stakeholders that these are just geeky nice-to-haves. Management will not back us if we ourselves are not sold on our own plans.


Lack of foresight
With very little variation, I have heard the refrain "this application was supposed to be temporary but we have been using it ever since" so many times that I can say with abundant proof and absolute conviction that software , once deployed, will always last much longer than anticipated.

The scandalous part however, is how non-chalantly many fail to look ahead even with major releases. The usual excuse is that there isn't enough time to engineer a given product properly. I understand; I know what it is to be under the gun. However, what about looking for ways "to oil the machine" once it has been set in motion? In these cases project managers, supervisors and or architects are especially to blame.

In the same way in which the members of the Mario Andretti racing team, when not building a new car, are continuously tuning the existing ones, development teams should be using their downtime to refactor, optimize and prepare code for the next release. A practice that I've observed in every successful development team has been to develop domain-specific APIs, code libraries and widgets that make it easier to build new applications or enhance the current ones.


Laziness
Sorry to put it so bluntly, but I have observed this one too many times. The laziness monster too often rears its head in poor coding practices and a flurry of porous "quick fixes" whose fallout down the road is never a small matter. Does this mean that software will always work no matter what? No. Although it is an ideal that we should aim for, there is always that obscure permutation no one had counted on. However, failures so constant they hardly give time to recover from the previous ones and seriously threaten mission critical applications are a sign of a corrosive IT culture. We need to develop teams of conscientious developers who will handle the companies code with the same integrity that we expect our accountants to manage our investment portfolios and retirement accounts.



The belief that we can scam our way out of this one.
Please, do yourself the favor of never promising something to management just because you know it will make you look good; the suits will eventually find out you lied to them and you will fry sooner or later. Promise deliverables that can actually be delivered. If you are selling to management a bill of goods you know you can't deliver, you are playing the part of a con artist — not an IT professional. Resist the temptation to appear superhuman; instead, listen to the stakeholders and provide them with a solution they can rely on. Tone done the whizbang; focus instead on solving problems and filling needs.


What about deadlines?
This question always comes up when the need for better software is brought up, so I will address it. As a developer/architect sometime supervisor I have come to realize that another thing to be learned from the person building the bookshelf in the CFO's office is that he/she will listen patiently to all the requirements and plans the client has and will then, having clearly discussed the appearance and features of the finished product, plan with him/her for a realistic delivery date.

So why is this not the case when it comes to IT? Many times IT professionals alternate between being either mice who just take notes as the stakeholders build a castle in the clouds or overwhelm the meeting with technical jargon that has little or not connection with the needs at hand. I have found that with managers and users alike, if it is made clear that we are on their side they will accept our timelines and give us room for realistic deadlines.


Remember, IT is different only in technology. Quality and customer service are concepts that apply to our line of work as well as they apply to others. Let's learn from the winners.

No comments:

Post a Comment

Care to comment? Have a question? Type your thoughts right in here :