Moving from Offshoring to Global Shared Service Centers

My apologies for the delay in my post. It has been a busy few months and it has taken an extended time since there is quite a bit I wish to cover in the global shared service center model. Since my NCAA bracket has completely tanked, I am out of excuses to not complete the writing, so here is the first post with at least one to follow. 

Since the mid-90s, companies have used offshoring to achieve cost and capacity advantages in IT. Offshoring was a favored option to address Y2K issues and has continued to expand at a steady rate throughout the past twenty years. But many companies still approach offshoring as  ’out-tasking’ and fail to leverage the many advantages of a truly global and high performance work force.

With out-tasking, companies take a limited set of functions or ‘tasks’ and move these to the offshore team. They often achieve initial economic advantage through labor arbitrage and perhaps some improvement in quality as the tasks are documented and  standardized in order to make it easier to transition the work to the new location. This constitutes the first level of a global team: offshore service provider. But larger benefits around are often lost and typically include:

  • further ongoing process improvement,
  • better time to market,
  • wider service times or ‘follow the sun’,
  • and leverage of critical innovation or leadership capabilities of the offshore team.

In fact, the work often stagnates at whatever state it was in when it was transitioned with little impetus for further improvement. And because lower level tasks are often the work that is shifted offshore and higher level design work remains in the home country, key decisions on design or direction can often take an extended period – actually lengthening time to market. In fact, design or direction decisions often become arbitrary or disconnected because the groups – one in home office, the other in the offshore location – retain significant divides (time of day, perspective, knowledge of the work, understanding of the corporate strategy, etc). At its extreme, the home office becomes the ivory tower and the offshore teams become serf task executors and administrators. Ownership, engagement, initiative and improvement energies are usually lost in these arrangements. And it can be further exacerbated by having contractors at the offshore location, who have a commercial interest in maintaining the status quo (and thus revenue) and who are viewed as with less regard by the home country staff. Any changes required are used to increase contractor revenues and margins. These shortcomings erase many of the economic advantages of offshoring over time and further impact the competitiveness of the company in areas such as agility, quality, and leadership development.

A far better way to approach your workforce is to leverage a ‘global footprint and a global team’. And this approach is absolutely key for competitive advantage and essential for competitive parity if you are an international company. There are multiple elements of the ‘global footprint and team’ approach, that when effectively orchestrated by IT leadership, can achieve far better results than any other structure. By leveraging high performance global approach, you can move from an offshore service provider to a shared service excellence center and, ultimately to a global service leadership center.

The key elements of a global team approach can be grouped into two areas: high performance global footprint and high performance team. The global footprint elements are:

  • well-selected strategic sites, each with adequate critical mass, strong labor pools and higher education sources
  • proper positioning to meet time-of-day and improved skill and cost mix
  • knowledge and leverage of distinct regional advantages to obtain better customer interface, diverse inputs and designs, or unique skills
  • proper consolidation and segmentation of functions across sites to achieve optimum cost and capability mixes

Global team elements include:

  • consistent global goals and vision across global sites with commensurate rewards and recognition by site
  • a team structure that enables both integrated processes and local and global controls
  • the opportunity for growth globally from a junior position to a senior leader
  • close partnership with local universities and key suppliers at each strategic location
  • opportunity for leadership at all locations

Let’s tackle global footprint today and in a follow on post I will cover global team. First and foremost is selecting the right sites for your company. Your current staff total size and locations will obviously factor heavily into your ultimate site mix. Assess your current sites using the following criteria:

  • Do they have critical mass (typically at least 300 engineers or operations personnel, preferably 500+) that will make the site efficient, productive and enable staff growth?
  • Is the site located where IT talent can be easily sourced? Are there good universities nearby to partner with? Is there a reasonable Are there business units co-located or customers nearby?
  • Is the site in a low, medium, or high cost location?
  • What is the shift (time zone) of the location?

Once you have classified your current sites with these criteria, you can then assess the gaps. Do you have sites in low-cost locations with strong engineering talent (e.g. India, Eastern Europe)? Do you have medium cost locations (e.g., Ireland or 2nd tier cities in the US midwest)? Do you have too many small sites (e.g., under 100 personnel)? Do you have sites close to key business units or customers? Are no sites located in 3rd shift zones? Remember that your sites are more about the cities they are located in than the countries. A second tier city in India or a first or second tier city in Eastern Europe can often be your best site location because of improved talent acquisition and lower attrition than 1st tier locations in your country or in India.

It is often best to locate your service center where there are strong engineering and business universities nearby that will provide an influx of entry level staff eager to learn and develop. Given staff will be the primary cost factor in your service, ensure you locate in lower cost areas that have good language skills, access to the engineering universities, and appropriate time zones. For example, if you are in Europe, you should look to have one or two consolidated sites located just outside 2nd tier cities with strong universities. For example, do not locate in Paris or London, instead base your service desk either in or just outside Manchester or Budapest or Vilnius. This will enable you to tap into a lower cost yet high quality labor market that also is likely to provide more part-time workers that will help you solve peak call periods. You can use a similar approach in the US or Asia.

A highly competitive site structure enables you to meet a global optimal cost and capability mix as well. At the most mature global teams in very large companies, we drove for a 20/40/40 cost mix (20% high cost, 40% medium and 40% low cost) where each site is in a strong engineering location. Where possible, we also co-located with key business units. Drive to the optimal mix by selecting 3, 4, or 5 strategic sites that meet the mix target and that will also give you the greatest spread of shift coverage.  Once you have located your sites correctly, you mud then of course drive to have effective recruiting, training, and management of the site to achieve outstanding service. Remember also that you must properly consolidate functions to these strategic sites.  You key functions must be consolidated to 2 or 3 of the sites – you cannot run a successful functions where there are multiple small units scattered around your corporate footprint. You will be unable to invest in the needed technology and provide an adequate career path to attract the right staff if it is highly dispersed.

You can easily construct a matrix and assess your current sites against these criteria. Remember these sites are likely the most important investments your company will make. If you have poor portfolio of sites, with inadequate labor resources or effective talent pipelines or other issues, it will impact your company’s ability to attract and retain it’s most important asset to achieve competitive success. It may take substantial investment and an extended period of time, but achieving an optimal global site and global team will provide lasting competitive advantage.

I will cover the global team aspects in my next post along with the key factors in moving from a offshore service provider to shared service excellence to shared service leadership.

It would be great to hear of your perspectives and any feedback on how you or your company been either successful (or unsuccessful) at achieving a global team.

Best, Jim Ditmore

Posted in Building High Performance Teams, Efficiency and Cost Reduction, Vision and Leadership, World Class Production Availability | Tagged , , , | 2 Comments

Keeping Score and What’s In Store for 2014

Now that 2013 is done, it is time to review my predictions from January last year. For those keeping score, I had six January predictions for Technology in 2013:

6. 2013 is the year of the ‘connected house’ as standards and ‘hub’ products achieve critical mass. Score: Yes! - A half dozen hubs were introduced in 2013 including Lowe’s and AT&T’s as well as SmartThings and Nest. The sector is taking off but is not quite mainstream as there is a bit of administration and tinkering to get everything hooked in. Early market share could determine the standards and the winners here.

5. The IT job market will continue to tighten requiring companies to invest in growing talent as well as higher IT compensation. Score: Nope! - Surprisingly, while the overall job market declined from a 7.9% unemployment rate to 7.0% over 2013, the tech sector had a slight uptick from 3.3% to 3.9% in the 3rd quarter (4Q numbers not available). However, this uptick seems to be caused by more tech workers switching jobs (and thus quitting old jobs) perhaps due to more confidence and better pay elsewhere. Look for a continued tight supply of IT workers as the Labor department predicts that by 2020, another 1.4M IT workers are required and there will only be 400K IT graduates during that time!

4. Fragmentation will multiply in the mobile market, leaving significant advantage to Apple and Samsung being the only companies commanding premiums for their products. Score: Yes and no - Fragmentation did occur in Android segment, but the overall market consolidated greatly. And Samsung and Apple continued in 2013 to capture the lion’s share of all profits from mobile and smart phones. Android picked up market share (and fragment into more players), as well as Windows Phone, notably in Europe. Apple dipped some, but the greatest drop was in ‘other’ devices (Symbian, Blackberry, etc). So expect a 2014 market dominated by Android, iOS, and a distant third to Windows Phone. And Apple will be hard pressed to come out with lower cost volume phones to encourage entry into their ecosystem. Windows Phone will need to continue to increase well beyond current levels especially in the US or China in order to truly compete.

3. HP will suffer further distress in the PC market both from tablet cannibalization and aggressive performance from Lenovo and Dell. Score: Yes! - Starting with the 2nd quarter of 2013, Lenovo overtook HP as the worldwide leader in PC shipments and then widened it in the 3rd quarter. Dell continued to outperform the overall market sector and finished a respectable second in the US and third in the world. Overall PC shipments continued to slide with an 8% drop from 2012, in large part due to tablets. Windows 8 did not help shipments and there does not look like a major resurgence in the market in the near term. Interestingly, as with smart phones, there is a major consolidation occurring around the top 3 vendors in the market — again ‘other’ is the biggest loser of market share.

2. The corporate server market will continue to experience minimal increases in volume and flat or downward pressure on revenue. Score: Yes! - Server revenues declined year over year from 2012 to 2013 in the first three quarters (declines of 5.0%, 3.8%, and 2.1% respectively). Units shipped treaded water with a decline in the first quarter of .7%, an uptick in the second quarter of  4%, and a slight increase in the third quarter of 2%. I think 2014 will show more robust growth with greater business investment.

1. Microsoft will do a Coke Classic on Windows 8. Score: Yes and no - Windows 8.1 did put back the Start button, but retained much of the ‘Metro’ interface. Perhaps best cast as the ‘Great Compromise’, Windows 8.1 was a half step back to the ‘old’ interface and a half step forward to a better integrated user experience. We will see how the ‘one’ user experience across all devices works for Microsoft in 2014.

So, final score was 3 came true, 2 mostly came true, and 1 did not – for a total score of 4. Not too bad though I expected a 5 or 6 :) . I will do one re-check of the score when the end of year IT unemployment figures come out to see if the strengthening job market made up for the 3rd quarter dip.

As an IT manager, it is important to have strong, robust competition – it was good to see both Microsoft and HP come out swinging in 2013. Maybe they did not land many punches but it is good to have them back in the games.

Given it is the start of the year, I thought I would map out some of the topics I plan to cover this coming year in my posts. As you know, the focus of Recipe for IT  is useful best practice techniques and advice that works in the real world and enables IT managers to be more successful. In 2013, we had a very successful year with over 43,000 views from over 150 countries, (most are from the US, UK, India, and Canada). And I wish to thank the many who have contributed comments and feedback — it has really helped me craft a better product. So with that in mind, please provide your perspective on the upcoming topics, especially if there are areas you would like to see covered that are not.

For new readers, I have structured the site into two main areas: posts – which are short, timely essays on a particular topic and reference pages- which often take a post and provide a more structured and possibly deeper view of the topic. The pages are intended to be an ongoing reference of best practice for you leverage. You can reach the reference pages from the drop down links on the home page.

For posts, I will be continue the discussion on cloud and data centers. I will also explore flash storage and the continuing impact of mobile. Security will invariably be a topic. Some of you may have noticed some posts are placed first on InformationWeek and then subsequently here. This helps increase the exposure of Recipe for IT and also ensure good editing (!).

For the reference pages, I have recently refined and will continue to improve the production and quality areas. Look also for updates and improvements to leadership  as well as the service desk.

What other topics would you like to see explored? Please comment and provide your feedback and input.

Best, and I wish you a great start to 2014,

Jim Ditmore

Posted in Best Practices, Just for fun, Looking Ahead | Tagged , | 2 Comments

Celebrate 2013 Technology or Look to 2014?

The year is quickly winding down and 2013 will not be remembered as a stellar year for technology. Between the NSA leaks and Orwellian revelations, the Healthcare.gov mishaps, the cloud email outages (and Yahoo’s is still lingering) and now the 40 million credit identities stolen from Target, 2013 actually was a pretty tough year for the promise of technology to better society.

While the breakneck progress of technology continued, we witnessed so many shortcomings in its implementation. Fundamental gaps in large project delivery and availability design and implementation continue to plague large and widely used systems.   It is as if the primary design lessons of ‘Galloping Gertie’ regarding resonance were never absorbed by bridge builders. The costs of such major flaws in these large systems are certainly similar to that of a failed bridge.  And as it turns out, if there is a security flaw or loophole, either the bad guys or the NSA will exploit it. I particularly like NSA’s use of ‘smiley faces’ on internal presentations when they find a major gap in someone else’s system.

So, given 2013 has shown the world we live in all too clearly, as IT leaders let’s look to 2014 and resolve to do things better. Let’s continue to up the investment in security within our walls and be more demanding of our vendors to improve their security. Better security is the number 2 focus item (behind data analytics) for most firms and the US government. And security spend will increase an out-sized amount even as total spend goes up by 5%. This is good news, but let’s ensure the money is spent well and we make greater progress in 2014. Of course, one key step is to get XP out of your environment by March since it will no longer be patched by Microsoft. For a checklist on security, here is a good start at my best practices security reference page.

As for availability, remember that quality provides the foundation to availability. Whether design, implementation or change, quality must be woven throughout these processes to enable robust availability and meet the demands of today’s 7×24 mobile consumers. Resolve to move your shop from craft to science in 2014, and make a world of a difference for your company’s interface to its customers. Again, if you are wondering how best to start this journey and make real progress, check out this primer on availability.

Now, what should you look for in 2014? As with last January, where I made 6 predictions for 2013, I will make 6 technology predictions for 2014. Here we go!

6. There will be consolidation in the public cloud market as smaller companies fail to gather enough long term revenue to survive and compete in a market with rapidly falling prices. Nirvanx was the first of many.

5. NSA will get real governance, though it will be secret governance. There is too much of a firestorm for this to continue in current form.

4. Dual SIM phones become available in major markets. This is my personal favorite wish list item and it should come true in the Android space by 4Q.

3. Microsoft’s ‘messy’ OS versions will be reduced, but Microsoft will not deliver on the ‘one’ platform. Expect Microsoft to drop RT and continue to incrementally improve Pro and Enterprise to be more like Windows 7. As for Windows Phone OS, it is a question of sustained market share and the jury is out. It should hang on for a few more years though.

2. With a new CEO, a Microsoft breakup or spinoffs are in the cards. The activist shareholders are holding fire while waiting for the new CEO, but will be applying the flame once again. Effects? How about Office on the iPad? Everyone is giving away software and charging for hardware and services, forcing an eventual change in the Microsoft business model.

1. Flash revolution in the enterprise. What looked at the start of 2013 to be 3 or more years out looks now like this year. The emergence of flash storage at prices (with de-duplication) comparable to traditional storage and 90% reductions in environmentals will become a stampede with the next generation of flash costing significantly less than disk storage.

What are your top predictions? Anything to change or add?

I look forward to your feedback and next week I will assess how my predictions from January 2013 did — we will keep score!

Best, and have a great holiday,

Jim Ditmore

Posted in Information Security, Just for fun, Looking Ahead, Vision and Leadership | Tagged , | 1 Comment

How Did Technology End Up on the Sunday Morning Talk Shows?

It has been two months since the Healthcare.gov launch and by now nearly every American has heard or witnessed the poor performance of the websites. Early on, only one of every five users was able to actually sign in to Healthcare.gov, while poor performance and unavailable systems continue to plague the federal and some state exchanges. Performance was still problematic several weeks into the launch and even as of Friday, November 30, the site was down for 11 hours for maintenance. As of today, December 1, the promised ‘relaunch day’, it appears the site is ‘markedly improved’ but there are plenty more issues to fix.

What a sad state of affairs for IT. So, what does the Healthcare website issues teach us about large project management and execution? Or further, about quality engineering and defect removal?

Soon after the launch, former federal CTO Aneesh Chopra, in an Aspen Institute interview with The New York Times‘ Thomas Friedman, shrugged off the website problems, saying that “glitches happen.” Chopra compared the Healthcare.gov downtime to the frequent appearances of Twitter’s “fail whale” as heavy traffic overwhelmed that site during the 2010 soccer World Cup.

But given that the size of the signup audience was well known and that website technology is mature and well understood, how could the government create such an IT mess? Especially given how much lead time the government had (more than three years) and how much it spent on building the site (estimated between $300 million and $500 million).

Perhaps this is not quite so unusual. Industry research suggests that large IT projects are at far greater risk of failure than smaller efforts. A 2012 McKinsey study revealed that 17% of lT projects budgeted at $15 million or higher go so badly as to threaten the company’s existence, and more than 40% of them fail. As bad as the U.S. healthcare website debut is, there are dozens of examples, both government-run and private of similar debacles.

In a landmark 1995 study, the Standish Group established that only about 17% of IT projects could be considered “fully successful,” another 52% were “challenged” (they didn’t meet budget, quality or time goals) and 30% were “impaired or failed.” In a recent update of that study conducted for ComputerWorld, Standish examined 3,555 IT projects between 2003 and 2012 that had labor costs of at least $10 million and found that only 6.4% of them were successful.

Combining the inherent problems associated with very large IT projects with outdated government practices greatly increases the risk factors. Enterprises of all types can track large IT project failures to several key reasons:

  • Poor or ambiguous sponsorship
  • Confusing or changing requirements
  • Inadequate skills or resources
  • Poor design or inappropriate use of new technology

Unfortunately, strong sponsorship and solid requirements are difficult to come by in a political environment (read: Obamacare), where too many individual and group stakeholders have reason to argue with one another and change the project. Applying the political process of lengthy debates, consensus-building and multiple agendas to defining project requirements is a recipe for disaster.

Furthermore, based on my experience, I suspect the contractors doing the government work encouraged changes, as they saw an opportunity to grow the scope of the project with much higher-margin work (change orders are always much more profitable than the original bid). Inadequate sponsorship and weak requirements were undoubtedly combined with a waterfall development methodology and overall big bang approach usually specified by government procurement methods. In fact, early testimony by the contractors ‘cited a lack of testing on the full system and last-minute changes by the federal agency’.

Why didn’t the project use an iterative delivery approach to hone requirements and interfaces early? Why not start with healthcare site pilots and betas months or even years before the October 1 launch date? The project was underway for three years, yet nothing was made available until October 1. And why did the effort leverage only an already occupied pool of virtualized servers that had little spare capacity for a major new site? For less than 10% of the project costs a massive dedicated farm could have been built.  Further, there was no backup site, nor any monitoring tools implemented. And where was the horizontal scaling design within the application to enable easy addition of capacity for unexpected demand? It is disappointing to see such basic misses in non-functional requirements and design in a major program for a system that is not that difficult or unique.

These basic deliverables and approaches appear to have been fully missed in the implementation of the wesite. Further, the website code appears to have been quite sloppy, not even using common caching techniques to improve performance. Thus, in addition to suffering from weak sponsorship and ambiguous requirements, this program failed to leverage well-known best practices for the technology and design.

One would have thought that given the scale and expenditure on the program, top technical resources would have been allocated and ensured these practices were used. The feds are  scrambling with a “surge” of tech resources  for the site. And while the new resources and leadership have made improvements so far, the surge will bring its own problems. It is very difficult to effectively add resources to an already large program. And, new ideas introduced by the ‘surge’ resources, may not be either accepted or easily integrated. And if the issues are deeply embedded in the system, it will be difficult for the new team to fully fix the defects. For every 100 defects identified in the first few weeks, my experience with quality suggests there are 2 or 3 times more defects buried in the system. Furthermore, if one wonders if the project couldn’t handle the “easy” technical work — sound website design and horizontal scalability – how will they can handle the more difficult challenges of data quality and security?

These issues will become more apparent in the coming months when the complex integration with backend systems from other agencies and insurance companies becomes stressed. And already the fraudsters are jumping into the fray.

So, what should be done and what are the takeaways for an IT leader? Clear sponsorship and proper governance are table stakes for any big IT project, but in this case more radical changes are in order. Why have all 36 states and the federal government roll out their healthcare exchanges in one waterfall or big bang approach? The sites that are working reasonably well (such as the District of Columbia’s) developed them independently. Divide the work up where possible, and move to an iterative or spiral methodology. Deliver early and often.

Perhaps even use competitive tension by having two contractors compete against each other for each such cycle. Pick the one that worked the best and then start over on the next cycle. But make them sprints, not marathons. Three- or six-month cycles should do it. The team that meets the requirements, on time, will have an opportunity to bid on the next cycle. Any contractor that doesn’t clear the bar gets barred from the next round. Now there’s no payoff for a contractor encouraging endless changes. And you have broken up the work into more doable components that can then be improved in the next implementation.

Finally, use only proven technologies. And why not ask the CIOs or chief technology architects of a few large-scale Web companies to spend a few days reviewing the program and designs at appropriate points. It’s the kind of industry-government partnership we would all like to see.

If you want to learn more about how to manage (and not to manage) large IT programs, I recommend “Software Runaways,” , by Robert L. Glass, which documents some spectacular failures. Reading the book is like watching a traffic accident unfold: It’s awful but you can’t tear yourself away. Also, I expand on the root causes of and remedies for IT project failures in my post on project management best practices.

And how about some projects that went well? Here is a great link to the 10 best government IT projects in 2012!

What project management best practices would you add? Please weigh in with a comment below.

Best, Jim Ditmore

This post was first published in late October in InformationWeek and has been updated for this site.

Posted in Best Practices, Efficiency and Cost Reduction, Looking Ahead, Project Management and Delivery, Uncategorized, Vision and Leadership | Tagged , , | 5 Comments

Whither Virtual Desktops?

The enterprise popularity of tablets and smartphones at the expense of PCs and other desktop devices is also sinking desktop virtualization. In addition to the clear link that tablets and smartphones are cannibalizing PC sales, mobility and changing device economics is also impacting corporate desktop virtualization or VDI.

The heyday of virtual desktop infrastructure came around 2008 to 2010, as companies sought to cut their desktop computing costs — VDI promised savings from 10% to as much as 40%. Those savings were possible despite the additional engineering and server investments required to implement the VDI stack. Some companies even anticipated replacing up to 90% of their PCs with VDI alternatives. Companies sought to reduce desktop costs and address specific issues not well-served by local PCs (e.g., smaller overseas sites with local software licensing and security complexities).

But something happened on the way to VDI dominance. The market changed faster than the maturing of VDI. Employee demand for mobile devices, in line with the BYOD phenomenon, has refocused IT shops on delivering mobile device management capabilities, not VDI. On-the-go employees are gravitating toward new lightweight laptops, a variety of tablets and other non-desktop innovations that aren’t VDI-friendly. Mobile employees want to use multiple devices; they don’t want to be tied down to a single VDI-based interface. And enterprise IT shops have refocused on delivering mobile device management capabilities so company employees can securely use their smartphones for their work. Given the VDI interface is at best cumbersome on a touch interface with a different OS than Windows, there will be less and less demand for VDI as the way to interconnect.  Given the dominance of these highly mobile smartphones and tablets will only increase in the next few years as the client device war between Apple, Android, and Microsoft (Nokia) heats up further (and they continue to produce better and cheaper products) VDI’s appeal will fall even farther.

Meantime, PC prices, both desktop and laptop, which have had a steady decline in the past 4 years, dropping 30-40% (other than Apple’s products, of course), will accelerate their price drop.  With the decline in shipments these past 18 months, the entire industry is overcapacity and the only way to out of the situation is to spur demand and better consumer interest in PCs is through further cost reductions. (Note that the answer is not that Windows 8 will spur demand). Already Dell and Lenovo are using lower prices to try to hold their volumes steady. And with other devices entering the market (e.g. Smart TVs, smart game stations, etc), it will become a very bloody marketplace. The end result for IT shops will be $300 laptops that are pretty slick that come fully with Windows (perhaps even Office). At those prices, VDI will have minimal or no cost advantage especially taking into account the backend VDI engineering costs.  And if you can buy a $300 laptop or tablet fully equipped that is preferred by most employees, IT shops will be hard pressed to pass that up and impose VDI. In fact, by late 2014, corporate IT shops in 2014 could be faced with their VDI solutions costing more than traditional client devices (e.g., that $300 laptop). This is because the major components of VDI costs (servers and engineering work and support) will not drop nearly as quickly as the distressed market PC costs. 

There is no escaping the additional engineering time and attention VDI requires. The complex stack (either Citrix or VMware) still requires more engineering than a traditional solution. And with this complexity, there will still be bugs between the various client and VDI and server layers that impact user experience. Recent implementations still show far too many defects between the layers. At Allstate, we have had more than our share of defects in our recent rollout between the virtualization layer, Windows, and third party products. And this is for what should be by now, a mature technology.

Faced with greater costs, greater engineering resources (which are scarce) and employee demand for the latest mobile client devices, organizations will begin to throw in the towel on VDI. Some companies now deploying will reduce the scope of current VDI deployments. Some now looking at VDI will jump instead to mobile-only alternatives more focused on tablets and smartphones. And those with extensive deployments will allow significant erosion of their VDI footprint as internal teams opt for other solutions, employee demand moves to smartphones and tablets or lifecycle events occur. This is a long fall from the lofty goals of 90% deployment from a few years ago. IT shops do not want to be faced with both supporting VDI for an employee who also has a tablet, laptop or desktop solution because it essentially doubles the cost of the client technology environment. In an era of very tight IT budgets, excess VDI deployments will be shed.

One of the more interesting phenomenon in the rapidly changing world of technology is when a technology wave gets overtaken well before it peaks. This occurred many times before (think optical disk storage in the data center) but perhaps most recently with netbooks where their primary advantages of cost and simplicity where overwhelmed by smartphones (from below) and ultra-books from above. Carving out a sustainable market niche on cost alone in the technology world is a very difficult task, especially when you consider that you are reversing long term industry trends.

Over the past 50 years of computing history, the intelligence and capability has been drawn either to the center or to the very edge. In the 60s, mainframes were the ‘smart’ center and 3270 terminals were the ‘dumb’ edge device. In the 90s, client computing took hold and the ‘edge’ became much smarter with PCs but there was a bulging middle tier of the three tier client compute structure. This middle tier disappeared as hybrid data centers and cloud computing re-centralized computing. And the ‘smart’ edge moved out even farther with smartphones and tablets. While VDI has a ‘smart’ center, it assumes a ‘dumb’ edge, which goes against the grain of long term compute trends. Thus the VDI wave, a viable alternative for a time, will be dissipated in the next few years as the long term compute trends overtake it fully.

I am sure there will still be niche applications, like offshore centers (especially where VDI also enables better control of software licensing) and there will still be small segments of the user population that will swear by the flexibility to access their device from anywhere they can log in without carrying anything, but these are ling term niches. Long term, VDI solutions will have a smaller and smaller portion of the device share, perhaps 10%, maybe even 20%, but not more.

What is your company’s experience with VDI? Where do you see its future?

Best, Jim Ditmore

 This post was first published in InformationWeek on September 13, 2013 and has been slightly revised and updated.
Posted in Best Practices, Efficiency and Cost Reduction, Looking Ahead, Mobile | 4 Comments

Getting to Private Cloud: Key Steps to Build Your Cloud

Now that I am back from summer break, I want to continue to further the discussion on cloud and map out how medium and large enterprises can build their own private cloud. As we’ve discussed previously, software-as-a-service, engineered stacks and private cloud will be the biggest IT winners in the next five to ten years. Private clouds hold the most potential — in fact, early adopters such as JP Morgan Chase and Fidelity are seeing larger savings and greater benefits than initially anticipated.

While savings is a key reason to move to a private cloud, shorter development cycles and faster time to market are more significant. Organizations can test risky ideas more easily as small, low-cost projects, quickly dispensing with those projects that fail and accelerating those that show more promise.

While savings is a key driver to moving to private cloud, faster development cycles and better time to market are turning out to be both more significant and more valuable to early adopter firms than initially estimated. And it is not just a speed improvement but a qualitative improvement where smaller projects can trialled or riskier pilots can be executed with far greater speed and nominal costs. This allows a ‘fast fail’ approach on corporate innovation that greatly speeds the selection process, avoids extensive wasted investment in lengthier traditional pilots (that would have failed anyway) and greatly improves time to market on those ideas that are successful.

As for the larger savings, early implementations at scale are seeing savings well in excess of 50%. This is well beyond my estimate of 30% and is occurring in large part because of the vastly reduced labor requirements to build and administer a private cloud versus traditional infrastructure.

So with greater potential benefits, how should an IT department go about building a private cloud? The fundamental building blocks required for private cloud are a base of virtualized servers utilizing commodity servers and leveraging open systems. And of course you need the server engineering and administration expertise to support the platform. There’s also a strong early trend toward leveraging open source software for private clouds, from the Linux operating system to OpenNebula and Eucalyptus for infrastructure management. But just having a virtualized server platform does not result in private cloud. There are several additional elements required.

First, establish a set of standardized images that constitute most of the stack. Preferably, that stack will go from the hardware layer to the operating system to the application server layer, and it will include systems management, security, middleware and database. Ideally, go with a dozen or fewer server images and certainly no more than 20. Consider everything else to be custom and treated separately and differently from the cloud.

Once you have established your target set of private cloud images you should build a catalogue and ordering process that is easy, rapid, and transparent. The costs should be clear, and the server units should be processor-months or processor-weeks. You will need to couple the catalogue with highly automated provisioning and de-provisioning. Your objective should be to deliver servers quickly, certainly within hours, preferably within minutes (once the costs are authorized by the customer). And de-provisioning should be just as rapid and regular. In fact, you should offer automated ‘sunset’ servers in test and development environments (e.g., after 90 days the server(s) are allocated, they are automatically returned to the pool). I strongly recommend well-published and clear cost and allocation reporting to drive the right behaviors among your users. It will encourage quicker adoption, better and more efficient usage and rapid turn-in when no longer needed. With these 4 prerequisites in place (standard images, a catalogue and easy ordering process, clear costs and allocations, and automated provisioning and de-provisioning) you are ready to start your private cloud.

Look to build your private cloud in parallel to your traditional data center platforms. There should be both a development and test private cloud as well as a production private cloud. Seed the cloud with an initial investment of servers of each standard type. Then transition demand into the private as new projects initiate and proceed to grow it project by project.

You could begin by routing small and medium size projects to the private cloud environment and as it builds up scale and provisioning kinks are ironed out, migrate more and more server requests until nearly all requests are routed through your private cloud path. As you begin to achieve scale and you prove out your ordering and provisioning (and de-provisioning processes) you can begin to tighten the criteria for projects to proceed with traditional custom servers. Within 6 months, custom, traditional servers should be the rare exception and should be charged fully for the excess costs they will generate.

 Once the private cloud is established you can verify the costs savings and advantages. And there will be additional advantages such as improved time to market because of improvements in the speed of your development efforts given server deployment is no longer a long pole in the tent. Well-armed with this data, you can now circle back and tackle existing environments and legacy custom servers. While often the business case for a platform transition is not a good investment, a transition to private cloud during another event (e.g., major application release, server end-of-life migration) should easily become a winning investment. A few early adopters (such as JPMC or Fidelity) are seeing outsized benefits and strong developer push into these private cloud environments. So, if you build it well, you should be able to reap the same advantages.

How is your cloud journey proceeding? Are there other key steps necessary to be successful? I look forward to hearing your perspective.

Best, Jim Ditmore

 

Posted in Best Practices, Cloud, Data Centers, Looking Ahead | Leave a comment

Looking to Improve IT Production? How to Start

Production issues, as Microsoft and Google can tell you, impact even cloud email apps. A few weeks ago, Microsoft took an entire weekend to full recover its cloud Outlook service. Perhaps you noted the issues earlier this year in financial services where Bank of America experienced internet site availability issues. Unfortunately for Bank of America that was their second outage in 6 months, though they are not alone in having problems as Chase suffered a similar production outage on their internet services the week following. And these are regular production issues, not the unavailability of websites and services due to a series of DD0S attacks.

Perhaps 10 or certainly 15 years ago, such outages with production systems would have resulted in far less notice by their customers as the front office personnel would have worked alternate systems and manual procedures until the systems were restored. But with customers accessing the heart of most companies systems now through internet and mobile applications, typically on a 7×24 basis, it is very difficult to avoid direct and widespread impact to customers in the event of a system failure. Your production performance becomes very evident to your customers. And your customers’ expectations have continued to increase such that they expect your company and your services to be available pretty much whenever they want to use them. And while being available is not the only attribute that customers value (usability, feature, service and pricing factor in importantly as well) companies that consistently meet or exceed consumer availability expectations gain a key edge in the market.

So how do you deliver to current and future rising expectations around availability of your online and mobile services? And if both BofA and Chase, which are large organizations that offer dozens of services online and have massive IT departments have issues delivering consistently high availability, how can smaller organizations deliver compelling reliability?

And often, the demand for high availability must be achieved in an environment where ongoing efficiencies have eroded the production base and a tight IT labor market has further complicated obtaining adequate expertise. If your organization is struggling with availability or you are looking to achieve top quartile performance and competitive service advantage, here’s where to start:

First, understand that availability, at its root, is a quality issue. And quality issues can only be changed if you address all aspects. You must set quality and availability as a priority, as a critical and primary goal for the organization. And you will need to ensure that incentives and rewards are aligned to your team’s availability goal.

Second, you will need to address the IT change processes. You should look to implement an ITSM change process based on ITIL. But don’t wait for a fully defined process to be implemented. You can start by limiting changes to appropriate windows. Establish release dates for major systems and accompanying subsystems. Avoid changes during key business hours or just before the start of the day. I still remember the ‘night programmer’ at Ameritrade at the beginning of our transformation there. Staying late one night as CIO in my first month, I noticed two guys come in at 10:30 PM. When I asked what they did, they said ‘ We are the night programmers. When something breaks with the nightly batch run, we go in and fix it.’  And done with no change records, minimal testing and minimal documentation. Of course, my hair stood on end hearing this. We quickly discontinued that practice and instead made changes as a team, after they were fully engineered and tested. I would note that combining this action with a number of other measures mentioned here enabled us to quickly reach a stable platform that had the best track record for availability for all online brokerages.

Importantly, you should ensure that adequate change review and documentation is being done by your teams for their changes. Ensure they take accountability for their work and their quality. Drive to an improved change process with templates for reviews, proper documentation, back out plans, and validation. Most failed changes are due to issues with the basics: a lack of adequate review and planning, poor change documentation of deployment steps, or missing or ineffective validation, or one person doing an implementation in the middle of the night when you should have at least two people doing it together (one to do, and one to check).

Also, you should measure the proportion of incidents due to change. If you experience mediocre or poor availability and failed changes contribute to more than 30% of the incidents, you should recognize change quality is a major contributor to your issues. You will need to zero in on the areas with chronic change issues. Measure the change success rate (percentage of changes executed successfully without production incident) of your teams. Publish the results by team (this will help drive more rapid improvement). Often, you can quickly find which of your teams has inadequate quality because their change success rate ranges from a very poor mid-80s percentage to a mediocre mid-90s percentage. Good shops deliver above 98% and a first quartile shop consistently has a change success rate of 99% or better.

Third, ensure all customer impacting problems are routed through an enterprise command center via an effective incident management process. An Enterprise Command Center (ECC) is basically an enterprise version of a Network Operations Center or NOC, where all of your systems and infrastructure are monitored (not just networks). And the ECC also has capability to facilitate and coordinate triage and resolution efforts for production issues. An effective ECC can bring together the right resources from across the enterprise and supporting vendors to diagnose and fix production issues while providing communication and updates to the rest of the enterprise. Delivering highly available systems requires an investment into an ECC and the supporting diagnostic and monitoring systems. Many companies have partially constructed the diagnostics or have siloed war rooms for some applications or infrastructure components. To fully and properly handle production issues requires consolidating these capabilities and extending their reach.  If you have an ECC in place, ensure that all customer impacting issues are fully reported and handled. Underreporting of issues that impact a segment of your customer base, or the siphoning off of a problem to be handled by a local team, is akin to trying to handle a house fire with a garden hose and not calling the fire department. Call the fire department first, and then get the garden hose out while the fire trucks are on their way.

Fourth, you must execute strong root cause and followup. These efforts must be at the individual issue or incident level as well as at a summary or higher level. It is important to not just get focused on fixing the individual incident and getting to root cause for that one incident but to also look for the overall trends and patterns of your issues. Are they cluster with one application or infrastructure component? Are they caused primarily by change? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. They may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). Track both the fixes for individual issues as well as the efforts to address systemic issues. The systemic efforts will begin to yield improvements that eliminate future issues.

These four efforts will set you on a solid course to improved availability. If you couple these efforts will diligent engagement by senior management and disciplined execution, the improvements will come slowly at first, but then will yield substantial gains that can be sustained.

You can achieve further momentum with work in several areas:

  • Document configurations for all key systems.  If you are doing discovery during incidents it is a clear indicator that your documentation and knowledge base is highly inadequate.
  • Review how incidents are reported. Are they user reported or did your monitoring identify the issue first? At least 70% of the issues should be identified first by you, and eventually you will want to drive this to a 90% level. If you are lower, then you need to look to invest in improving your monitoring and diagnostic capabilities.
  • Do you report availability in technical measures or business measures? If you report via time based systems availability measures or number of incidents by severity, these are technical measures. You should look to implement business-oriented measures such as customer impact availability. to drive great transparency and more accurate metrics.
  • In addition to eliminating issues, reduce your customer impacts by reducing the time to restore service (Microsoft can certainly stand to consider this area given their latest outage was three days!). For mean time to restore (MTTR – note this is not mean time to repair but mean time to restore service), there are three components: teime to detect (MTTD), time to diagnose or correlation (MTTC), and time to fix (to restore service or MTTF). An IT shop that is effective at resolution normally will see MTTR at 2 hours or less for its priority issues where the three components each take about 1/3 of the time. If your MTTD is high, again look to invest in better monitoring. If your MTTC is high look to improve correlation tools, systems documentation or engineering knowledge. And if your MTTF is high, again look to improve documentation or engineering knowledge or automate recovery procedures.
  • Consider investing in greater resiliency for key systems. It may be that customer expectations of availability exceed current architecture capabilities. Thus, you may want to invest in greater resiliency and redundancy or build a more highly available platform.

As you can see, providing robust availability for your customers is a complex endeavor. By implementing these steps, you can enable sustainable and substantial progress to top quartile performance and achieve business advantage in today’s 7×24 world.

What would you add to these steps? What were the key factors in your shop’s journey to high availability?

Best, Jim Ditmore

Posted in Best Practices, Data Centers, World Class Production Availability | Tagged , , , , | 4 Comments

Turning the Corner on Data Centers

Recently I covered the ‘green shift’ of servers where each new server generation is not only driving major improvements in compute power but is also requires about the same or even less environmentals (power, cooling, space) as the previous generation. Thus, compute efficiency, or compute performance per watt, is improving exponentially. And this trend in servers, which started in 2005 or so, is also being repeated in storage. We have seen a similar improvement in power per terabyte  for the past 3 generations (since 2007). Current storage product pipeline suggests this efficiency trend will continue for the next several years. Below is a chart showing representative improvements in storage efficiency (power per terabyte) across storage product generations from a leading vendor.

Power (VA) per Terabyte

Power (VA) per Terabyte

With current technology advances, a terabyte of storage on today’s devices requires approximately 1/5 of the amount of power as a device from 5 years ago. And these power requirements could drop even more precipitously with the advent of flash technology. By some estimates, there is a drop of 70% or more in power and space requirements with the switch to flash products. In addition to being far more power efficient, flash will offer huge performance advantages for applications with corresponding time reductions in completing workload. So expect flash storage to quickly convert the market once mainstream product introductions occur. IBM sees this as just around the corner, while other vendors see the flash conversion as 3 or more years out. In either scenario, there are continued major improvements in storage efficiency in the pipeline that deliver far lower power demands even with increasing storage requirements.

Ultimately, with the combined efficiency improvements of both storage and server environments over the next 3 to 5 years, most firms will see a net reduction in data center requirements. The typical corporate data center power requirements are approximately one half server, one third storage, and the rest being network and other devices. With the two biggest components experiencing ongoing dramatic power efficiency trends, the net power and space demand should decline in the coming years for all but the fastest growing firms. Add in the effects of virtualization, engineered stacks and SaaS and the data centers in place today should suffice for most firms if they maintain a healthy replacement pace of older technology and embrace virtualization.

Despite such improvements in efficiency, we still could see a major addition in total data center space because cloud and consumer firms like Facebook are investing major sums in new data centers. This resulting consumer data center boom also shows the effects of growing consumerization in the technology market place. Consumerization, which started with PCs and PC software, and then moved to smart phones, has impacted the underlying technologies dramatically. The most advanced compute chips are now those developed for smart phones and video games. Storage technology demand and advances are driven heavily by smart phones and products like the MacBook Air which already leverage only flash storage. The biggest and best data centers? No longer the domain of corporate demand, instead, consumer demand (e.g. Gmail, FaceBook, etc) drives bigger and more advanced centers. The proportion of data center space dedicated to direct consumer compute needs (a la GMail or Facebook) versus enterprise compute needs (even for companies that provide directly consumer services) will see a major shift from enterprise to consumer over the next decade. This will follow the shifts in chips and storage that at one time were driven by the enterprise space (and previously, the government) and are now driven by the consumer segment. And it is highly likely that there will be a surplus of enterprise class data centers (50K – 200K raised floor space) in the next 5 years. These centers are too small and inefficient for a consumer data center (500K – 2M or larger), and with declining demand and consolidation effects, plenty of enterprise data center space will be on the market.

As an IT leader, you should ensure your firm is riding the effects of the compute and storage efficiency trends. Further multiply these demand reduction effects by leveraging virtualization, engineered stacks and SaaS (where appropriate). If you have a healthy buffer of data center space now, you could avoid major investments and costs in data centers in the next 5 to 10 years by taking these measures. Those monies can instead be spent on functional investments that drive more direct business value or drop to the bottom line of your firm. If you have excess data centers, I recommend consolidating quickly and disposing of the space as soon as possible. These assets will be worth far less in the coming years with the likely oversupply. Perhaps you can partner with a cloud firm looking for data center space if your asset is strategic enough for them. Conversely, if you have minimal buffer and see continued higher business growth, it may be possible to acquire good data center assets for far less unit cost than in the past.

For 40 years, technology has ridden Moore’s Law to yield ever-more-powerful processors at lower cost. Its compounding effects have been astounding — and we are now seeing nearly 10 years of similar compounding on the power efficiency side of the equation (below is a chart for processor compute power advances and compute power efficiency advances).

Trend Change for Power Efficiency

The chart above shows how the compute efficiency (performance per watt — green line) has shifted dramatically from its historical trend (blue lines). And it’s improving about as fast as compute performance is improving (red lines), perhaps even faster.

These server and storage advances have resulted in fundamental changes in data centers and their demand trends for corporations. Top IT leaders will be take advantage of these trends and be able to direct more IT investment into business functionality and less into the supporting base utility costs of the data center, while still growing compute and storage capacities to meet business needs.

What trends are you seeing in your data center environment? Can you turn the corner on data center demand ? Are you able to meet your current and future business needs and growth within your current data center footprint and avoid adding data center capacity?

Best, Jim Ditmore

Posted in Best Practices, Cloud, Data Centers, Efficiency and Cost Reduction | Tagged , , | 7 Comments

Using Organizational Best Practices to Handle Cloud and New Technologies

I have extended and updated this post which was first published in InformationWeek in March, 2013. I think it is a very salient and pragmatic organizational method for IT success. I look forward to your feedback! Best, Jim

IT organizations are challenged to keep up with the latest wave of cloud, mobile and big data technologies, which are outside the traditional areas of staff expertise. Some industry pundits recommend bringing on more technology “generalists,” since cloud services in particular can call on multiple areas of expertise (storage, server, networking). Or they recommend employing IT “service managers” to bundle up infrastructure components and provide service offerings.

But such organizational changes can reduce your team’s expertise and accountability and make it more difficult to deliver services. So how do you grow your organization’s expertise to handle new technologies? At the same time, how do you organize to deliver business demands for more product innovation and faster delivery yet still ensure efficiency, high quality and security?

Rather than acquire generalists and add another layer of cost and decision making to your infrastructure team, consider the following:

Cloud computing. Assign architects or lead engineers to focus on software-as-a-service and infrastructure-as-a-service, ensuring that you have robust estimating and costing models and solid implementation and operational templates. Establish a cloud roadmap that leverages SaaS and IaaS, ensuring that you don’t overreach and end up balkanizing your data center.

For appliances and private cloud, given their multiple component technologies, let your best component engineers learn adjacent fields. Build multi-disciplinary teams to design and implement these offerings. Above all, though, don’t water down the engineering capacity of your team by selecting generalists who lack depth in a component field. For decades, IT has built complex systems with multiple components by leveraging multi-faceted teams of experts, and cloud is no different.

Where to use ‘service managers’. A frequent flaw in organizations is to employ ‘service managers’ who group multiple infrastructure components (e.g. storage, servers, data centers, etc) into a ‘product’ (e.g. ‘hosting service’) and provide direction and interface for this product. This is an entirely artificial layer that then removes accountability from the component teams and often makes poor ‘product’ decision because of limited knowledge and depth. In the end IT does not deliver ‘hosting services’; IT delivers systems that meet business functions (e.g., for banking, teller or branch functions, ATMs; or for insurance, claims reporting or policy quote or issue). These business functions are the true IT services and are where you should apply a service manager role. Here, a service manager can ensure end-to-end integration and quality, drive better overall transaction performance and reliability, and provide deep expertise on system connections and SLAs and business needs back across the application and infrastructure component teams. And because it is directly attached to the business functions to be done, it will yield high value. These service managers will be invaluable for both new development and enhancement work as well as assisting during production issues.

Mobile. If mobile isn’t already the most critical interface for your company, it will be in three to five years. So don’t treat mobile as an afterthought, to be adapted from traditional interfaces. And don’t outsource this capability, as mobile will be pervasive in everything you build.

Build a mobile competency center that includes development, user experience and standards expertise. Then fan out that expertise to all of your development teams, while maintaining the core mobile group to assist with the most difficult efforts. And of course, continue with a central architecture and control of the overall user experience. A consistent mobile look, feel and flow is essentially your company’s brand, invaluable in interacting with customers.

Big data. There are two key aspects of this technology wave: the data (and traditional analytic uses) and real-time data “decisioning,” similar to IBM’s Watson. You can handle the data analytics as an extension of your traditional data warehousing (though on steroids). However, real-time decisioning has the potential to dramatically alter how your organization specifies and encodes business rules.

Consider the possibility that 30% to 50% of all business logic traditionally encoded in 3 or 4 generation programming languages instead becomes decisioned in real time. This capability will require new development and business analyst skills. For now, cultivate a central team with these skills. As you pilot and determine how to more broadly leverage real-time data decisioning, decide how to seed your broader development teams with these capabilities. In the longer run, I believe it will be critical to have these skills as an inherent portion of each development team.

Competing Demands. Overall, IT organizations must meet several competing demands: Work with business partners to deliver competitive advantage; do so quickly in order to respond to (and anticipate) market demands; and provide efficient, consistent quality while protecting the company’s intellectual property, data and customers. In essence, there are business and market drivers that value speed, business knowledge and closeness at a reasonable cost and risk drivers that value efficiency, quality, security and consistency.

Therefore, we must design an IT organization and systems approach that meets both sets of drivers and accommodates business organizational change. As opposed to organizing around one set of drivers or the other, the best solution is to organize IT as a hybrid organization to deliver both sets of capabilities.

Typically, the functions that should be consolidated and organized centrally to deliver scale, efficiency and quality are infrastructure (especially networks, data centers, servers and storage), IT operations, information security, service desks and anything else that should be run as a utility for the company. The functions to be aligned and organized along business lines to promote agility and innovation are application development (including Web and mature mobile development), data marts and business intelligence.

Some functions, such as database, middleware, testing and project management, can be organized in either mode. But if they aren’t centralized, they’ll require a council to ensure consistent processes, tools, measures and templates.

For services becoming a commodity, or where there’s a critical advantage to having one solution (e.g., one view of the customer for the entire company), it’s best to have a single team or utility that’s responsible (along with a corresponding single senior business sponsor). Where you’re looking to improve speed to market or market knowledge, organize into smaller IT teams closer to the business. The diagram below gives a graphical view of the hybrid organization.

The IT Hybrid Model diagram
With this approach, your IT shop will be able to deliver the best of both worlds. And you can then weave in the new skills and teams required to deliver the latest technologies such as cloud and mobile. You can read more about this hybrid model in our best practice reference page.

Which IT organizational approaches or variations have you seen work best? How are you accommodating new technologies and skills within your teams? Please weigh in with a comment below.

Best, Jim Ditmore

Posted in Best Practices, Building High Performance Teams, Vision and Leadership | Tagged , | 1 Comment

Beyond Big Data

Today’s post on Big Data is authored by Anthony Watson, CIO of Europe, Middle East Retail & Business Banking at Barclays Bank. It is thought-provoking take on ‘Big Data’ and how best to effectively use it. Please look past the atrocious British spelling :). We look forward to your comments and perspective.  Best, Jim Ditmore

In March 2013,  I read with great interest the results of the University of Cambridge analysis of some 58,000 Facebook profiles. The results predicted unpublished information like gender, sexual orientation, religious and political leanings of the profile owners. In one of the biggest studies of its kind, scientists from the university’s psychometrics team developed algorithms that were 88% accurate in predicting male sexual orientation, 95% for race and 80% for religion and political leanings. Personality types and emotional stability were also predicted with accuracy ranging from 62­‐75%. The experiment was conducted over the course of several years through their MyPersonality website and Facebook Application. You can sample a limited version of the method for yourself at http://www.YouAreWhatYouLike.com.

Not surprisingly, Facebook declined to comment on the analysis, but I guarantee you none of this information is news to anyone at Facebook. In fact it’s just the tip of the iceberg. Without a doubt the good people of Facebook have far more complex algorithms trawling, interrogating and manipulating its vast and disparate data warehouses, striving to give its demanding user base ever richer, more unique and distinctly customised experiences.

As an IT leader, I’d have to be living under a rock to have missed the “Big Data” buzz. Vendors, analysts, well-­‐intentioned executives and even my own staff – everyone seems to have an opinion lately, and most of those opinions imply that I should spend more money on Big Data.

It’s been clear to me for some time that we are no longer in the age of “what’s possible” when it comes to Big Data. Big Data is “big business” and the companies that can unlock, manipulate and utilise data and information to create compelling products and services for their consumers are going to win big in their respective industries.

Data flow around the world and through organisations is increasing exponentially and becoming highly complex; we’re dealing with greater and greater demands for storing, transmitting, and processing it. But in my opinion, all that is secondary. What’s exciting is what’s being done with it to enable better customer service and bespoke consumer interactions that significantly increase value along all our service lines in a way that was simply not possible just a few years ago. This is what’s truly compelling. Big Data is just a means to an end, and I question whether we’re losing sight of that in the midst of all the hype.

Why do we want bigger or better data? What is our goal? What does success look like? How will we know if we have attained it? These are the important questions and I sometimes get concerned that – like so often before in IT – we’re rushing (or being pushed by vendors, both consultants and solution providers alike) to solutions, tools and products before we really understand the broader value proposition. Let’s not be a solution in search of a problem. We’ve been down that supply-centric road too many times before.

For me it’s simple; Innovation starts with demand. Demand is the force that drives innovation. However this should not be confused with the axiom “necessity is the mother of invention”. When it comes to technology we live in a world where invention and innovation are defining the necessity and the demand. It all starts with a value experience for our customers. Only through a deep understanding of what “value” means to the customer can we truly be effective in searching out solutions. This understanding requires an open mind and the innovative resolve to challenge the conventions of “how we’ve always done it.”

Candidly I hate the term “Big Data”. It is marketing verbiage, coined by Gartner that covers a broad ecosystem of problems, tools, techniques, products, and solutions. If someone suggests you have a Big Data problem, that doesn’t say much as arguably any company operating at scale, in any industry, will have some sort of challenge with data. But beyond tagging all these challenges with the term Big Data, you’ll find little in common across diverse industries, products or services.

Given this diversity across industry and within organisations, how do we construct anything resembling a Big Data strategy? We have to stop thinking about the “supply” of Big Data tools, techniques, and products peddled by armies of over eager consultants and solution providers. For me technology simply enables a business proposition. We need to look upstream, to the demand. Demand presents itself in business terms. For example in Financial Services you might look at:

  • Who are our most profitable customers and, most importantly, why?
  • How do we increase customer satisfaction and drive brand loyalty?
  • How do we take excess and overbearing processes out of our supply chain and speed up time to market/service?
  • How do we reduce our losses to fraud without increasing compliance & control costs?

Importantly, asking these questions may or may not lead us down a Big Data road. But we have to start there. And the next set of questions is not about the solutions but framing the demand and potential solutions:

  • How do we understand the problem today? How is it measured? What would improvement look like?
  • What works in our current approach, in terms of the business results? What doesn’t? Why? What needs to improve?
  • Finally, what are the technical limitations in our current platforms? Have new techniques and tools emerged that directly address our current shortcomings?
  • Can we develop a hypothesis, an experimental approach to test these new techniques, so that they truly can deliver an improvement?
  • Having conducted the experiment, what did we learn? What should we abandon, and what should we move forward with?

There’s a system to this. Once we go through the above process, we start the cycle over. In a nutshell, it’s the process of continuous improvement. Some of you will recognise the well‐known cycle of Plan, Do, Check, Act (“PDCA”) in the above.

Continuous improvement and PDCA are interesting, in that they are essentially the scientific method applied to business and one of the notable components of the Big Data movement is the emerging role of the Data Scientist.

So, who can help you assess this? Who is qualified to walk you through the process of defining your business problem and solving them through innovative analytics? I think it is the Data Scientist.

What’s a Data Scientist? It’s not a well‐defined position, but here would be an ideal candidate:

  • Hands‐on experience with building and using large and complex databases, relational and non-relational, and in the fields of data architecture and information management more broadly
  • Solid applied statistical training, grounded in a broader context of mathematical modeling.
  • Exposure to continuous improvement disciplines and industrial theory.
  • Most Importantly: Functional understanding of whatever industry is paying their salary i.e., Real world operational experience – theory is valuable; “scar tissue” is essential.

This person should be able to model data, translate that model into a physical schema, load that schema from sources, and write queries against it, but that’s just the start. One semester of introductory stats isn’t enough. They need to know what tools to use and when, and the limits and trade‐offs of those tools. They need to be rigorous in their understanding and communication of confidence levels in their models and findings, and cautious of the inferences they draw.

Some of the Data Scientist’s core skills are transferrable, especially at the entry level. But at higher levels, they need to specialise. Vertical industry problems are rich, challenging, and deep. For example, an expert in call centre analytics would most certainly struggle to develop comparable skills in supply chain optimisation or workforce management.

And ultimately, they need to be experimentalists – true scientists engaged in a quest for knowledge on behalf of their company or organisation with an unresolvable sense of curiosity: engaged in a continuous cycle of:

  • examining the current reality,
  • developing and testing hypotheses, and
  • delivering positive results for broad implementation so that the cycle can begin again.

There are many sectors we can apply Big Data techniques to: financial services, manufacturing, retail, energy, and so forth. There are also common functional domains across the sectors: human resources, customer service, corporate finance, and even IT itself.

IT is particularly interesting. It’s the largest consumer of capital in most enterprises. IT represents a set of complex concerns that are not well understood in many enterprises: projects, vendors, assets, skilled staff, and intricate computing environments. All these come together to (hopefully) deliver critical and continuous value in the form of agile, stable and available IT services for internal business stakeholders, and most importantly external customers.

Given the criticality of IT, it’s often surprising how poorly managed IT is in terms of data and measurement. Does IT represent a Big Data domain? Yes, absolutely. From the variety of IT deliverables and artefacts and inventories, to the velocity of IT events feeding management consoles, to the volume of archived IT logs, IT itself is challenged by Big Data. IT is a microcosm of many business models. We in IT don’t do ourselves any favours starting from a supply perspective here, either. IT’s legitimate business questions include:

  • Are we getting the IT we’re paying for? Do we have unintentional redundancy in what we’re buying? Are we paying for services not delivered?
  • Why did that high severity incident occur and can we begin to predict incidents?
  • How agile are our systems? How stable? How available?
  • Is there a trade-off between agility? stability? and/or availability? How can we increase all three?

With the money spent on IT, and its operational criticality, Data Scientists can deliver value here as well. The method is the same: understand the current situation, develop and test new ideas, implement the ones that work, and watch results over time as input into the next round.

For example, the IT organisation might be challenged by a business problem of poor stakeholder trust, due to real or perceived inaccuracies in IT cost recovery. In turn, it is then determined that these inaccuracies stem from poor data quality for the IT assets on which cost recovery is based.

Data Scientists can explain that without an understanding of data quality, one does not know what confidence a model merits. If quality cannot be improved, the model remains more uncertain. But often, the quality can be improved. Asking “why” – perhaps repeatedly – may uncover key information that assists in turn with developing working and testable hypotheses for how to improve. Perhaps adopting master data management techniques pioneered for customer and product data will assist. Perhaps measuring the IT asset data quality trends over time is essential to improvement – people tend to focus on what is being measured and called out in a consistent way. Ultimately, this line of inquiry might result in the acquisition of a toolset like Blazent, which provides IT analytics & data quality solutions enabling a true end‐to-end view of the IT ecosystem. Blazent is a toolset we’ve deployed at Barclays to great effect.

Similarly, a Data Scientist schooled in data management techniques, and with an experimental, continuous improvement orientation might look at an organisation’s recurring problems in diagnosing and fixing major incidents, and recommend that analytics be deployed against the terabytes of logs accumulating every day, both to improve root cause analysis, and ultimately to proactively predict outage scenarios based on previous outage patterns. Vendors like Splunk and Prelert might be brought in to assist with this problem at the systems management level. SAS has worked with text analytics across incident reports in safety-­‐critical industries to identify recurring patterns of issues.

It all starts with business benefit and value. The Big Data journey must begin with the end in mind, and not rush to purchase vehicles before the terrain and destination is known. A Data Scientist, or at least someone operating with a continuous improvement mind-­‐set who will champion this cause, is an essential component. So, rather than just talking about “Big Data,” let’s talk about “demand-­‐driven data science.” If we take that as our rallying cry and driving vision, we’ll go much further in delivering compelling, demonstrable and sustainable value in the end.

Best, Anthony Watson

Posted in Best Practices, Big Data, Decision Sciences, Looking Ahead | 2 Comments