Release Management Best Practices

Release management is the process of managing, planning, scheduling and controlling a software build through different stages and environments; including testing and deploying a software release. Release management is combined development and a production process that interfaces into an organization’s change management process to implement into production. 

Normally for medium-sized or large organizations, initiatives arise with greater complexity and size where their implementation into production has corresponding greater criticality and larger potential impact if done poorly. Such implementations exceed the scope of typical project management or change management processes that are geared to one project and one system or component. For efforts that cross multiple components with significant functional or design changes, the disciplines of program management and release management are employed to increase quality and success rates.

Mature release management enables greater complexity, size and quality of software that is implemented into production. With successful release management, productivity, communication, and coordination are improved, and the organization can deliver software faster while decreasing risk, thus increasing overall software speed or agility. These improvements mean the team can repeatedly produce quality software with shorter times to market, which allows the company to be more responsive to the operating environment.

Release management also helps standardize and streamline the development and operations process.  Properly implemented release management with auditable release controls, ensures effective documentation of decisions and builds a repository for all releases throughout the life cycle. A single, well-documented process that is followed for all releases increases allows teams to continuously improve production implementations and draw more useful lessons from experience and apply them in future releases. Mature teams can repeatedly achieve releases in the hundreds of thousands of hours of effort with a handful of minor and no serious defects.

The Release Management Process

Release management starts during the design phase. If it is a one-time initiative managed by a program, than release management is normally an activity under that program. If the work is a repeated cycle of builds and implementations for a software product or service, than release management interfaces with the project management and product leadership as a separate activity. Normally, a release manager is appointed to handle each release separately. The release manager ensures proper definition of the release and then coordinates implementation projects and activities across the supporting systems and components to ensure a smooth and integrated delivery. 

In the design phase, the release is defined and the key functions and capabilities to be delivered are described (these will later become the release notes). Further, all involved systems and components are identified in the design phase. Their work is mapped out and prerequisite tasks are identified. This includes mapping dependencies and concurrent activities. These activity maps enable the implementation steps and window to then be drafted. 

The implementation window draft is used to identify potentially impacted users and customers. The release manager then coordinates discussions with user groups and representatives to ensure proper placement of change windows and ensure impacts are acceptable. If the release are a repeat occurrence (e.g. monthly or quarterly) than the change windows and expected impacts should already be established. These discussions then occur only if a particularly large or complex release that requires a larger window. 

As the release build is proceeding under program or project management, the release manager also builds the release plan. The plan is a joint product of the build teams, the test team, and operations. Given the defined window, the implementation task dependencies, and the estimated times for implementation, the release tasks are defined and orchestrated. It is important when the plan is constructed, that potential precursor tasks are identified. These tasks (e.g. backups, index rebuilds, cabling, hardware installations, operating software upgrades) should be, wherever possible executed before the release window and fully verified and confirmed as effective.  This separation of multiple complex activities greatly reduces risk and the potential overrun of the release window. For example, if the backup or index rebuild fails that is planned to be done before the release window starts, then the release can be easily deferred until it is fully understood why that activity failed and the issues are resolved. Otherwise, the team will be in the middle of multiple activities at various stages of completion and it will be difficult to retreat to a known stable state.  Once all tasks, including precursor tasks are defined and mapped, the draft plan is then circulated and reviewed by all parties and a draft timeline that can leveraged by the implementation lead and operations is then constructed. 

Once the build cycles are complete, the release enters into the test phase with draft release notes, release activities and plan, and release timeline. During the test phase, release management begins coordinating with change management and ensuring all activities are properly entered into the change management system. Normally for large releases, changes are not entered in isolation but are captured under a release ‘parent’ that is linked to all of the change records (and activities) for that release. Even for minor releases, such linkage is helpful. Typically mature shops have at least 3 or 4 types of change records: administrative for minor and routine changes typically associated with infrastructure configuration changes; regular changes which are independent and not closely linked with other changes; functional or minor release changes which are linked or associated with changes in other systems and teams; and major release changes which have numerous changes and activities across multiple systems and teams that must be coordinated closely. 

During the test phase, training and documentation is also built and provided to operations (for handling and systems processing changes), to the service desks (to handled changed business processes) and to the end user teams (e.g., branch offices, etc).  If changes are significant to underlying systems and components, operational instrumentation will need to be updated as well. This also should be tested and verified during the test phase.

To reduce risks and ensure data integrity, the release manager strengthens the release plan. Typical measures include: 

  • verified backups are taken prior to the start of the release 
  • business activity is quiesced prior to implementation of changes
  • appropriate checkpoints are part of the release plan
  • back out plans for critical stages have been defined and are workable
  • verification activities of critical steps are included
  • proper monitoring of all components occurs during the release 
  • comprehensive verification of changes and services is performed at the end of the release window
  • early, start-of-day verification and checkout routines are performed before the next production day
  • enhanced monitoring and heightened service desk support is in place at the start of production the next day
  • engineering teams are ready and on call if an issue arises
  • incident call support and command centers are prepared for potential issues and lead engineers and relevant managers are prepared to immediately be on the calls or meetings

As the test cycle is nearing completion, the release manager ensures all assets for the release are in order. These assets include:

  • a full release plan 
  • release deliverables including code and documentation and test cases and routines,
  • operational procedure changes or updates, 
  • updated instrumentation,
  • training materials and business operations signoff
  • service desk documentation or knowledge management updates
  • appropriate architecture and security reviews and signoffs (if applicable)
  • full change management documentation and adherence,
  • release notes

The release manager should be monitoring the testing metrics as the release progresses. Typical metrics include defects (by severity), their discovery rate, their resolution rate, and test case successful completion rate. These metrics should be compared to previous similar release as they can be very revealing as to the quality of the release. The metrics also enable projection of when (or if) the release will reach acceptable quality levels to be implemented into production. These early signs (or warnings) are extremely helpful to position leadership to understand the risks and potential impacts of the release. The release manager must be careful to not be a cheerleader for introduction into production regardless of quality, but instead to ensure that all original goals including functionality, quality, and time to market are met in a balanced way. Quality should be the least compromised deliverable since the cost and impacts can overwhelm the benefits if poorly implemented.

Once testing is successful and all assets are in order, the release manager normally convenes a go/no go meeting with the development leaders, testing lead and business stakeholders (if needed) and decision to proceed or not is then made. 

Once implementation is a go, then operations will facilitate a release implementation can using the operations call handling utility. Normally an operations lead, the release manager or other leads, and technology team leads will then participate and coordinate each of the activities defined in the plan in proper order. Progress through the plan is then maintained, and at each checkpoint the release manager and operations ensure approvals to move forward. 

When the release implementation is complete, the implementation log of activities is documented. Subsequently, a continuous improvement review should be held with all key participants to discuss the release, it’s successes or failures, and how the process and implementations can be improved. These should be noted by the release manager and then actions identified, published and tracked. This will enable streamlining of the process and increased quality and pace. 

 There is an ITIL process diagram available here that has an outline of the steps in the process.

Release Management Process Extensions

Further, for mature shops the change or release process can be extended by introducing production ready.  Production ready is when a system or major update can be introduced into production because it is ready on all the key performance aspects: security, recoverability, reliability, maintainability, usability, and operability. In our typical rush to deliver key features or products, the sustainability of the system is often neglected or omitted. By establishing the Operations team as the final approval gate for a major change to go into production, and leveraging the production ready criteria, organizations can ensure that these often neglected areas are attended to and properly delivered as part of the normal development process. These steps then enable a much higher performing system in production and avoid customer impacts. For more on production ready, please visit here.

Additionally, the release process can be enhanced through continuous improvement. Whether from failed changes, incident pattern analysis, or industry trends and practices, the release team should always be seeking to identify improvements. High performance or here, high quality, is never reached in one step, but instead in a series of many steps and adjustments. And given how IT systems themselves are dynamic and changing over time, one must be alert to new trends, new issues, new requirements, and adjust.

Often, where strong root cause and followup is executed, the focus is only at the individual issue or incident level. This can be all well and good for correcting the one issue, but if broader patterns  are missed, one can substantially undershoot optimal performance. The trees and the forest must be considered. Questions to ask include: Do issues cluster with one application or infrastructure component? Does a supplier contribute far too many issues? Is inadequate testing a common thread among incidents? Do you have some teams that create far more defects than the norm? Are your designs too complex? Are you using the products in a mainstream or unique manner – especially if you are seeing many OS or product defects? Use these patterns and analysis to identify the systemic issues your organization must fix. There may be process issues (e.g. poor testing), application or infrastructure issues (e.g., obsolete hardware), or other issues (e.g., lack of documentation, incompetent staff). By correcting things both individually and systemically far greater progress and quality can be achieved.