Availability Management
Description/Summary
Information Availability Management is dealing with the implementation and monitoring of a predefined Availability level for the IT environment. Availability management is balancing the insurance level for the disaster case with the resources, requirements and costs. Availability analysis and business requirements as starting point, based on Service definitions, SLAs and costs to define the customer requirements on Availability level. Internal, minimal Availability requirement is named IT basic recovery level. Additional Availability requirements of the customer has to be defined on individual base. Additional challenge is to balance the internal required Availability level with services levels provided by the external and internal service provider for IT services.
Basic constrain of the Availability is to keep required SLAs balanced with reasonable costs and efforts.
Objectives
Ensure that agreed facilities, services and resources can be resumed in agreed time scale and level of availability.
Information Availability Management contributes to an integrated Service Management approach by executing the following activities:
- Identifying and defining the internal and external Availability requirements
- Performing a Business Impact Analysis and a Risk Analysis
- Planning of Availability strategy and procedures
- Managing the implementation of Availability actions and the organization
- Evaluating the Availability procedures (preventive and recovery case) and Availability measures
- Availability Reporting
- Availability audit and improvement
Roles & Functions
Availability Management specific roles
Static Process Roles
- Availability Process Owner
- Availability Process Manager
IT Availability Management Process is controlled by the Availability Manager. Availability Manager can delegate tasks to specialized staff. Should it be necessary to use external staff, an approval of the budget responsible person is necessary.
- Availability Management Process Staff
Staff in Availability Management is performing mandatory tasks for the Availability Manager.
Dynamic Process Roles
- Availability Auditor
Availability Auditor is providing the verification of availability policies, processes and tools. Auditor should be altering an external and internal resource to provide independent and reliable audit.
Service Specific Roles
Roles depending on the affected service are found in the Service Description. The Service Description including the service specific roles is delivered from the Service Portfolio Management.
- Service Expert/Service Specialist
- Service Owner
Customer Specific Roles
Roles depending on the affected customer(s) are found in the Service Level Agreement. The Service Level Agreement for the customer specific roles is maintained by the Service Level Agreement Management.
- Customer(s)
- Customers of the affected Service with a valid SLA
Information artifacts
Availability Plan
Set of targets, measures, reports and actions to establish, maintain and audit certain level of Availability for a set of IT services for a certain time frame (year etc.)
Availability Design Record
- Availability Design ID
- Availability Design Requester
- Availability Design Description
- Availability Design Agent
- Availability Design Owner
- Service: The Service which has to be (re)designed
- Service Availability
- Service Reliability
- Service Resilience
- Service Maintainability
- Service is a VBF?
Availability Transition Record
- Availability Transaction ID
- Availability Transition Requester
- Availability Transition Description
- Availability Transition Agent
- Availability Transition Owner
- Service: Services affected by the Change
- Configuration Items: CI affected by the Change
Key Concepts
Terms
- Availability
- Ability of IT service to fulfill defined function within defined functional parameters on defined service level
- Reliability
- Parameter to define time of processing of a service without issues
- Resilience
- Attribute of a CI/Service to operate even in case of partial failure; such ability improves both Availability and Reliability
- Maintainability
- Degree of an IT service component to be set from status „failed“ to status „up and running“ according to defined SLA
- Serviceability
- Agreements with further service delivery partners like Facility Management or other external IT service provider
- Vital Business Functions (VBF)
- VBF are critical and business related elements within the scope of business processes, supported by the IT.
Availability Controls
Availability Data Set for Design
- Responsible persons
- will be defined out of the group of Availability responsible staff
- RFC
- Interface to Change Management
- Status
- Comment
- Service out of the service catalogue
- Interface to the Service Level Management. If service is known, all further parameters like service responsible person and expert group are known
- Availability Requirements
- Description level of availability for defined service
Life Cycle of Data Set for Design
Availability Data Set for Implementation
- GUID
- Responsible persons
- will be defined out of the group of Availability responsible staff
- RFC
- Interface to Change Management
- Status
- Comment
Life Cycle of Data Set for Implementation
Availability Data Set for Validation
Life Cycle of Data Set for Validation
Availability Data Set for Maintenance
Life Cycle of Data Set for Maintenance
Process
Critical Success Factors
Critical Success Factors (CSF) define a limited amount of factors influencing the success of a process. For the Availability Management following factors can be defined as CSF:
- Clear business requirements for availability
- Support from the organization and by the customer
- Effective Configuration Management Process
- Close collaboration with Change Management Process
- Awareness, trainings and tests
- Acknowledgment of IT-strategy
- Clear definition of key terms like „time = time to react“
- Monitoring from customer’s point of view
- Collaboration with other Delivery Processes
- Knowledge of technology market
Availability Manager has to regard the CSFs and to define and implement measures to fulfill the process success.
Availability Planning
High Level Process Flow Chart
This chart illustrates the Availability Management Planning process and its activities.
Performance Indicators (KPI)
- Number of changes to Availability Plan
- Number of Availability Problems
Process Trigger
Event Trigger
Time Trigger
Process is triggered periodical.
Process Specific Rules
Process Specific Rules
- Each Change to Availability Plan needs to be documented
- Each Change to Availability Plan needs to be communicated
Process Activities
Availability Plan Definition
Availability Plan Definition is dealing with definition of
- Basic availability plan
Activity Specific Rules
- Availability Manager is set to the person who responsible for availability process
- Availability Management Team is set persons who are staff involved in availability assurance activities
- The Availability Plan is defined:
- add new rules
- modify existing rules
- delete expired rules
Availability Plan Monitoring
Availability Plan Monitoring is responsible for the monitoring and update of availability planing.
Activity Specific Rules
- Communicate availability plan
- Assure that staff understands and follow rules by auditing and teaching staff
- trigger updates of the Availability Plan if necessary
Availability Design Assistance
Sub process Design in Availability Management is responsible for the initial planning or planning of optimizations of the Availability process. If a change proposal on Availability process is classified the change needs to be planned as well. This is performed by an expert out of the responsible expert group in coordination with a member of the IT Availability Staff. This sub process is triggered by Service Design. Actions aiming the improvement of Availability need to be documented in a RFC document in detail. After the planning of Availability measurements is finished the status needs to be changed to ‘‘plan.
Process owner is the Availability Manager. Process agent is an expert assigned by „Availability Staff“.
High Level Process Flow Chart
This chart illustrates the Availability Management Design process and its activities.
Performance Indicators (KPI)
- Duration of reaction in case of major availability issues
Process Trigger
Event Trigger
The process is initiated by Service Design.
Time Trigger
Process Specific Rules
Process Specific Rules
- Each Availability Design Assistance request must be recorded
- Availability Design Assistance Agent has to document the request and the result
- Availability Design Assistance Owner has to control the agent
- Availability Design Assistance requester has to be informed on design status
Process Activities
Design of Availability Part in Service Design Package
Within this activity the Availability section of the service design package is designed.
Activity Specific Rules
- Set the Availability Design Owner to a member of the IT Availability Management Staff
- Set the Availability Design Agent to a member of the Service Expert or Specialist Group
- Design Availability according to the Availability Plan
- Coordinate Availability Design package with the activy „Availability Design“ and other relevant Service Designer
- Document in the Service Design package and fill out the Availability Design Record
- Go to control activity „designed“
Approval of Availability Design Package
With this activity the Availability Manager decides on service design package. His decision is based on the cost expectations and the Availability Definition. In general three results of this activity are possible:
- Availability Design Package is finally neglected
- Availability Design Package is temporary refused and returned to the Availability expert, for the improvement or optimization of the Feasibility Study
- Availability Design Package is accepted.
Activity Specific Rules
- Set the Availability Design Agent to Availability Manager
- Approve the Availability Design Package and Documentation in the Availability Design Record
- On approval go to activity „approved – successful“
- else
- Go to control activity „new“ for a re-design of the Availability Design Package OR
- Go to control activity „approved – unsuccessful“ for final abortion.
Availability Transition Assistance
This activity, in cooperation with Change Management, is supporting the implementation and testing of Availability improvements by designing, testing, implementing and testing the implementation again. This actions are headed by Change Management.
If an Availability Design Package is authorized and approved by Change Management, all actions functional descriptions and implementation procedures described in the improvement proposal need to be detailed, tested and approved in cooperation with the Change Management. Afterwards the implementation should be assisted to provide help in case emergency or implementation issues.
Final PIR, conducted together with Change Management is also including testing.
High Level Process Flow Chart
This chart illustrates the Availability Management Transition process and its activities.
Performance Indicators (KPI)
- Availability
- Reliability
- Maintainability
- Serviceability
total number of RFC to compensate poor availability plan
per priority, per service, per CI, per user, per customer, per location, per employee, …
Process Trigger
Event Trigger
Process to be started by Change Management
Time Trigger
Process Specific Rules
Process Specific Rules
- Each Availability Transition Request request must be recorded
- Availability Transition Assistance Agent has to document the request and the result
- Availability Transition Assistance Owner has to control the agent
- Availability Transition Assistance Agent has to coordinate his work with other transition agents
- Availability Transition Assistance requester has to be informed on transition status
Process Activities
Creation Risk Analysis and Feasibility Study
If a availability transition assistance is requested by Change Management the following 2 artifacts need to be defined:
- Feasibility Study
- Risk Analysis
Following aspects need to be addressed within a Feasibility Study:
- Feasibility of proposal
- Risk of implementation
- Risk of neglecting proposal
- Costs
A Feasibility Study is based on high level planning and should not address detailed planning because of the possibility that the proposed change will not be accepted. Detailed planning of the proposal is part of the Transition sub-process.
A Feasibility Study is provided by the expert team of the address service. Eventually additional requirements need to be regarded that are provided by other assistance processes like Financial, Security or Capacity Management.
After the responsible expert finishes their contribution to the planning of the security actions, the status needs to be set on planned design package
Activity Specific Rules
- Set the Availability Transition Owner to a member of the IT Availability Management Staff
- Set the Availability Transition Agent to a member of the Service Expert or Specialist group
- Create a Availability Feasibility Study and Risk Analysis according to the Change Requirements and the Availability Plan
- Coordinate the creation of the Availability Feasibility Study and Risk Analysis package with others
- Document Availability Feasibility Study and Risk Analysis in the Availability Transition Record
- Go to activity „created“
Build – Test – Implement – Assistance
If an Availability change is approved for the implementation Change Management is assigning Availability Management respectively the sub-process Build – Test – Implement – Assistance for
- Detailed definition of implementation instructions for the Availability improvement,
- Detailed definition of test procedures and documents for the Availability improvement,
- Support of implementation,
- Testing of implementation and
- Approval of the implementation
For all activities above detailed documents are necessary.
Within the documentation of the improvement proposal, the order and time line of actions need to be described. Testing documentation hast to address the test design and assure effectiveness of the test.
Implementation activities are fulfilled and headed by Change Management – Availability Management is only assisting and supporting regarding the Availability aspects and functions. Availability Management can be defined as agent by Change Management for some implementation steps.
The Test is split up in two main test areas:
- 1. Test of the implementation – „Does the provided document describe the right implementation actions in the right implementation time order?“
If this test is positive, then
- 2. Test of the functionality of live-system – „Does the system perform the functions defined in Design sub-process?“ This test in performed based on the defined testing documentation. This test is addressing the test design and the test effect.
If both testing areas (see above) are positive, the data set is set in status enabled. The Change Management needs to be informed on the positive result of the testing and the data set status needs to be changed to waiting. In case of negative test results, the testing can be aborted (test status needs to be set on canceled) or transferred again in the activity consolidation.
This activity is performed by expert staff assigned by Availability Staff. Activity owner is the Availability Manager
Activity Specific Rules
- Support creation of implementation plan including fallback plan
- Support creation of test plan
- Support test of implementation including fallback plan
- Support implementation
- Document test and implementation results
- Coordinate work with Change Management
- Go to activity „assisted“
Evaluation and Closure Assistance
In coordination with the Post Implementaion Review of the Change, Avilability Managment helps to test the implementation from the avilability point of view. In cases of a failed tests, the Change Management has to decide if the fallback plan has to be executed or the implementation can be accepted despite any issues in testing.
Activity Specific Rules
- Support post implementation review and test
- Consult Change Management on Fallback Execution
- Support fallback implementation if necessary
- Document activities
- Go to activity „assisted – closed“