A Scientific Approach to IT Metrics

In order to achieve a world class or first quartile performance, it is critical to take a ‘scientific’ approach to IT metrics. Many shops remain rooted in ‘craft’ approaches to IT where techniques and processes are applied in an ad hoc manner to the work at hand and little is measured. Or, a smattering of process improvement methodologies (such as Six Sigma or Lean) or development approaches (e.g., Agile) are applied indiscriminately across the organization. Frequently then, due to a lack of success, the process methods or metrics focus are then tarred as being ineffective by managers.

Most organizations that I have seen that were mediocre performers typically have such craft or ad hoc approaches to their metrics and processes. And this includes not just the approach at the senior management level but at each of the 20 to 35 distinct functions that make up an IT shop (e.g., Networking, mainframes, servers, desktops, service desk, middleware , etc, and each business-focused area of development and integration). In fact, you must address the process and metrics at each distinct function level in order to then build a strong CIO level process, governance and metrics. And if you want to achieve 1st quartile or world-class performance, a scientific approach to metrics will make a major contribution. So let’s map out how to get to such an approach.

1) Evaluate your current metrics: You can pick several of the current functions you are responsible for and evaluate them to see where you are in your metrics approach and how to adjust to apply best practices. Take the following steps:

  • For each distinct function, identify the current metrics that are routinely used by the team to execute their work or make decisions.
  • Categorize these metrics as either operational metrics or reporting numbers. If they are not used by the team to do their daily work or they are not used routinely to make decisions on the work being done by the team, then these are reporting numbers. For example, they may be summary numbers reported to middle management or reported for audit or risk requirements or even for a legacy report that no one remembers why it is being produced.
  • Is a scorecard being produced for the function? An effective scorecard would have quantitative measures for the deliverables of the functions as well as objective scores for function goals that have been properly cascaded for the overall IT goals

2) Identify gaps with the current metrics: For each of IT functions there should be regular operational metrics for all key dimensions of delivery (quality, availability, cost, delivery against SLAs, schedule). Further, each area should have unit measures to enable an understanding of performance (e.g., unit cost, defects per unit, productivity, etc). As an example, the server team should have the following operational metrics:

    • all server asset inventory and demand volumes maintained and updated
    • operational metrics such as server availability, server configuration currency, server backups, server utilization should all be tracked
    • also time to deliver a server, total server costs, and delivery against performance and availability SLAs should be tracked
    • further secondary or verifying metrics such as server change success, server obsolescense, servers with multiple backup failures, chronic SLA or availability misses, etc should be tracked as well
    • function performance metrics should include cost per server (by type of server), administrators per server, administrator hours to build a server, percent virtualized servers, percent standardized servers, etc should also be derived

3) Establish full coverage: By comparing the existing metrics against the full set of delivery goals, you can quickly establish the appropriate operational metrics along with appropriate verifying metrics. Where there are metrics missing that should be gathered, work with the function to incorporate the additional metrics into their daily operational work and processes. Take care to work from the base metrics up to more advanced:

    • start with base metrics such as asset inventories and staff numbers and overall costs before you move to unit costs and productivity and other derived metrics
    • ensure the metrics are gathered in as automated a fashion as possible and as an inherent part of the overall work (they should not be gathered by a separate team or subsequent to the work being done

Ensure that verifying metrics are established for critical performance areas for the function as well. An example of this for the server function could be for the key activity of backups for a server:

    • the operational metrics would be perhaps backups completed against backups scheduled
    • the verifying metric would be twofold:
      • any backups for a single server that fail twice in a row get an alert and an engineering review as to why they failed (typically, for a variety of reasons 1% or fewer of your backups will fail, this is reasonable operational performance. But is one server does not get a successful backup for many days, you are likely putting the firm at risk if there is a database or disk failure, thus the critical alert)
      • every month or quarter, 3 or more backups are selected at random, and the team ensures they can successfully recover from the backup files. This will verify everything associated with the backup is actually working.

4) Collect the metrics only once: Often, teams collect similar metrics for different audiences. The metrics that they use to monitor for example configuration currency or configuration to standards, can be mostly duplicated by risk data collected against security parameter settings or executive management data on a percent server virtualization. This is a waste of the operational team’s time and can lead to confusing reports where one view doesn’t match another view. I recommend that you establish and overall metrics framework that includes risk and quality metrics as well as management and operational metrics so that all groups agree to the proper metrics. The metrics are then collected once, distilled and analyzed once, and congruent decisions can then be made by all groups. Later this week I will post a recommended metrics framework for a typical IT shop.

5) Drop the non-value numbers activity: For all those numbers that were identified as being gathered for middle management reports or for legacy reports with an uncertain audience; if there is no tie to a corporate or group goal, and the numbers are not being used by the function for operational purposes, I recommend to stop collecting the numbers and stop publishing any associated reports. It is non-value activity.

6) Use the metrics in regular review: At both the function team level and function management level the metrics should be trended, analyzed and discussed. These should be regular activities: monthly, weekly, and even daily depending on the metrics. The focus should be on how to improve, and based on the trends are current actions, staffing, processes, etc, enabling the team to improve and be successful on all goals or not. A clear feedback loop should be in place to enable the team and management to identify actions to take place to correct issues apparent through the metrics as quickly and as locally as possible. This gives control of the line to the team and the end result is better solutions, better work and better quality. This is what has been found in manufacturing time and again and is widely practiced by companies such as Toyota in their factories.

7) Summarize the metrics from across your functions into a scorecard: Ensure you identify the key metrics within each function and properly summarize and aggregate the metrics into an overall group score card. Obviously the score card should match you goals and key services that you deliver. It may be appropriate to rotate in key metrics from a function based on visibility or significant change. For example, if you are looking to improve overall time to market(TTM) of your projects, it may be appropriate to report on server delivery time as a key subcomponent and hopefully leading indicator of your improving TTM.  Including on your score card, even at a summarized level, key metrics from the various functions, will result in greater attention and pride being taken in the work since there is a very visible and direct consequences. I also recommend that on a quarterly basis, that you provide an assessment as to the progress and perhaps highlights of the team’s work as reflected in the score card.

8 ) Drive better results through proactive planning: The team and function management, once the metrics and feedback loop are in place, will be able to drive better performance through ongoing improvement as part of their regular activity. Greater increases in performance may require broader analysis and senior management support. Senior management should do proactive planning sessions with the function team to enable greater improvement to occur. The assignment for the team should be how take key metrics and what would be required to set them on a trajectory to a first quartile level in a certain time frame. For example, you may have both a cost reduction goal overall and within the server function there is a subgoal to achieve greater productivity (at a first quartile level)  and reduce the need for additional staff. By asking the team to map out what is required and by holding a proactive planning session on some of the key metrics (e.g. productivity) you will often identify the path to meet both local objectives that also contribute to the global objectives. Here, in the server example, you may find that with a moderate investment in automation, productivity can be greatly improved and staff costs reduced substantially. Thus both objectives could be obtained by the investment.  By holding such proactive sessions, where you ask the team to try and identify what would needs to be done to achieve a trajectory on their key metrics as well as considering what are the key goals and focus at the corporate or group level, you can often identify such doubly beneficial actions.

By taking these steps, you will employ a scientific approach to your metrics. If you add a degree of process definition and maturity, you will make significant strides to controlling and improving your environment in a sustainable way. This will build momentum and enable your team to enter a virtuous cycle of improvement and better performance. And then if add to the mix process improvement techniques (in moderation and with the right technique for each process and group), you will accelerate your improvement and results.

But start with your metrics and take a scientific approach. In the next week, I will be providing metrics frameworks that have stood well in large, complex shops along with templates that should help the understanding and application of the approach.

What metrics approaches have worked well for you? What keys would you add to this approach? What would you change?

Best, Jim

One thought on “A Scientific Approach to IT Metrics”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.