As an IT leader either of a function or of a large IT team with multiple functions, what is the emphasis you should place on metrics and how are you able to leverage them to attain improvement and achieve more predictable delivery? What other foundational elements must be in place for you to effectively leverage the metrics? Which metrics are key measures or leading indicators and which ones are lagging or less relevant?
For those of you just joining, this is the fourth post on metrics and in our previous posts we focused on key aspects of IT metrics (transparency, benchmarking, and a scientific approach). You can find these posts in the archives or, better yet, in the pages linked under the home page menu of Best Practices. The intent of the RecipeforIT blog is to provide IT leaders with useful, actionable practices, techniques and guidance to be more successful IT leaders and enable their shops to achieve much higher performance. I plan to cover most of the key practices for IT leaders in my posts, and as a topic is covered, I try to migrate the material into Best Practices pages. So, now back to metrics.
In essence there are three types of relevant metrics:
- operational or primary metrics – those metrics used to monitor, track and decision the daily or core work. Operational metrics are the base of effective management and are the fundamental measures of the activity being done. It is important that these metrics are collected inherently as part of the activity and best if the operational team collects, understands and changes their direction based on these metrics.
- verification or secondary metrics – those metrics used to verify that the work completed meets standards or is functioning as designed. Verification metrics should also be collected and reviewed by the same operational team though potentially by different members of the team or as participation as part of a broader activity (e.g. DR test). Verification metrics provide an additional measure of either overall quality or critical activity effectiveness.
- performance or tertiary metrics – those metrics that provide insight as to the performance of the function or activity. Performance metrics enable insight as to the team’s efficiency, timeliness, and effectiveness.
Of course, your metrics for a particular function should consist of those measures needed to successfully execute and manage the function as well as those measures that demonstrate progress towards the goals of your organization. For example, let’s take am infrastructure function: server management. What operational metrics should be in place? What should be verified on a regular basis? And what performance metrics should we have? While this will vary based on the maturity, scale, and complexity of the server team and environment, here is a good subset:
- Server asset counts (by type, by OS, by age, location, business unit, etc) and server configurations by version (n, n-1, n-2, etc) or virtualized/non-virtual and if EOL or obsolete
- Individual, grouped and overall server utilization, performance, etc by component (CPU, memory, etc)
- Server incidents, availability, customer impacts by time period, trended with root cause and chronic or repeat issues areas identified
- Server delivery time, server upgrade cycle time
- Server cost overall and by type of server, by cost area (admin, maintenance, HW, etc) and cost by vendor
- Server backup attempt and completion, server failover in place, etc
- Monthly sample of the configuration management database server records for accuracy and completeness, Ongoing scan of network for servers not in the configuration management database, Regular reporting of all obsolete server configs with callouts on those exceeding planned service or refresh dates
- Customer transaction times, Regular (every six months) capacity planning and performance reviews of critical business service stacks including servers
- Root cause review of all significant impacting customer events, auto-detection of server issues versus manual or user detection ratios
- DR Tests, server privileged access and log reviews, regular monthly server recovery or failover tests (for a sample)
- Level of standardization or virtualization, level of currency/obsolescence
- Level of customer impact availability, customer satisfaction with performance, amount of headroom to handle business growth
- Administrators per server, Cost per server, Cost per business transaction
- Server delivery time, man hours required to deliver a server
Obviously, if you are just setting out, you will collect on some of these metrics first. As you incorporate their collection and automate the work and reporting associated with them you can then tackle the additional metrics. And you will vary them according to the importance of different elements in your shop. If cost is critical, then reporting on cost and efficiency plays such as virtualization will naturally be more important. If time to market or availability are critical, than those elements should receive greater focus. Below is a diagram that reflects the construct of the three types of metrics and their relationship to the different metrics areas and score cards:
So, you have your metrics framework, what else is required to be successful leveraging the metrics?
First and foremost, the culture of your team must be open to alternate views and support healthy debate. Otherwise, no amount of data (metrics) or facts will enable the team to change directions from the party line. If you and your management team do not lead regular, fact-based discussions where course can be altered and different alternatives considered based on the facts and the results, you likely do not have the openness needed for this approach to be successful. Consider leading by example here and emphasize fact based discussions and decisions.
Also you must have defined processes that are generally adhered. If your group’s work is heavily ad hoc and different each time, measuring what happened the last time will not yield any benefits. If this is the case, you need to first focus on defining even at a high level, the major IT processes and help your team’s adopt them. Then you can proceed to metrics and the benefits they will accrue.
Accountability, sponsorship and the willingness to invest in the improvement activities are also key factors in the speed and scope of the improvements that can occur. As a leader you need to maintain a personal engagement in the metrics reviews and score card results. They should into your team’s goals and you should monitor the progress in key areas. Your sponsorship and senior business sponsorship where appropriate will be major accelerators to progress. And hold teams accountable for their results and improvements within their domain.
How does this correlate with your experience with metrics? Any server managers out there that would have suggestions on the server metrics?
Best, Jim Ditmore