IT Service Continuity Management

Description/Summary

Information Continuity Management is dealing with the implementation and monitoring of a predefined Continuity level for the IT environment. Continuity management is balancing the insurance level for the disaster case with the resources, requirements and costs. Continuity analysis as starting point, based on Service definitions, SLAs and costs to define the customer requirements on Continuity level. Internal, minimal Continuity requirement is named IT basic recovery level. Additional Continuity requirements of the customer has to be defined on individual base.

Objectives

Ensure that agreed facilities, services and resources can be reassumed in agreed time scale and level of availability.

Information Continuity Management contributes to an integrated Service Management approach by executing the following activities:

  • Identifying and defining the internal and external Continuity requirements
  • Performing a Business Impact Analysis and a Risk Analysis
  • Planning of Continuity strategy and procedures
  • Managing the implementation of Continuity actions and the organization
  • Evaluating the Continuity procedures (preventive and recovery case) and Continuity measures
  • Continuity Reporting
  • Continuity audit and improvement

Roles & Functions

Continuity Management specific roles

Static Process Roles

Continuity Process Owner
Continuity Process Manager

IT Continuity Management Process is controlled by the Continuity Manager. Continuity Manager can delegate tasks to specialized staff. Should it be necessary to use external staff, an approval of the budget responsible person is necessary.

Continuity Management Process Staff

Staff in Continuity Management is performing mandatory tasks for the Continuity Manager.

Dynamic Process Roles

Continuity Auditor

Service Specific Roles

Roles depending on the affected service are found in the Service Description.

Service Expert/Service Specialist
Service Owner
Service User

Customer Specific Roles

Roles depending on the affected customer(s) are found in the Service Level Agreement. The Service Level Agreement for the customer specific roles is maintained by the Service Level Agreement Management.

Customer(s)
Customers of the affected Service with a valid SLA

Information artifacts

Continuity Plan (Recovery Plan)

Set of targets, measures, reports and actions to establish, maintain and audit certain level of Continuity for a set of IT services for a certain time frame (year etc.)

Continuity Design Record

  • Continuity Design ID
  • Continuity Design Requester
  • Continuity Design Description
  • Continuity Design Agent
  • Continuity Design Owner
  • Service: The Service which has to be (re)designed
  • Risk for Service:
  • Recovery Plan for Service:
  • Fallback Option in case of Recovery Plan Failing:

Continuity Transition Record

  • Continuity Transaction ID
  • Continuity Transition Requester
  • Continuity Transition Description
  • Continuity Transition Agent
  • Continuity Transition Owner
  • Service: Services affected by the Change
  • Configuration Items: CI affected by the Change

Continuity Verification record

  • Continuity Verification ID
  • Continuity Verification Requester
  • Continuity Verification Description
  • Continuity Verification Agent
  • Continuity Verification Owner
  • Continuity Verification Auditor
  • Service: Services affected by the Continuity Audit
  • Configuration Items: CI affected by the Continuity Audit
  • Customer: Customer affected by the Continuity Audit
  • User: User affected by the Continuity Audit
  • Expert/Specialists: Expert/Specialists affected by the Continuity Audit

Key Concepts

Continuity Controls

Continuity Data Set for Design

Responsible persons
will be defined out of the group of Continuity responsible staff
RFC
Interface to Change Management
Status
Comment
Service out of the service catalogue
Interface to the Service Level Management. If service is known, all further parameters like service responsible person and expert group are known
Continuity Requirements
Description what needs to be changed at the defined service

Life Cycle of Data Set for Design

Continuity Data Set for Implementation

GUID
Responsible persons
will be defined out of the group of Continuity responsible staff
RFC
Interface to Change Management
Status
Comment

Life Cycle of Data Set for Implementation

Continuity Data Set for Validation

Life Cycle of Data Set for Validation

Continuity Data Set for Maintenance

Life Cycle of Data Set for Maintenance

Risk Assessment, Business Continuity Analysis and Business Impact Analysis as inputs to the Recovery Plan

Several analysis areas to define the Continutiy Plan:

  • Risk Assessment (CRAMM) helps to determine the likelihood and the possible size of disaster case or other disruption. Risk assessment is defined by the analysis on
    • Assets
    • Threats
    • Vulnerabilities
  • Business Continuity Management: analysis of business processes to ensure a minimal service. minimization of risks and developing plans for recovery
  • Business Impact Analysis: determining the requirements how much the organization is to lose as a result of a disaster or other service disruption

This factors analysis areas a risk and risk probability. To define countermeasures is the task of the recovery plan.

Recovery Options

  • do nothing
  • fallback to a manual system
  • mutual agreements
  • gradual recovery (Cold Standby) > 72 h
  • intermediate recovery (Warm Standby) 24 - 72 h
  • immediate recovery (Hot Standby) < 24 h
  • combination of options

Process

Critical Success Factors

C’’ritical S’’uccess Factors (CSF) define a limited amount of factors influencing the success of a process. For the Continuity Management following factors can be defined as CSF:

  • Support from the organization and by the customer
  • Effective Configuration Management Process
  • Close collaboration with Change Management Process
  • Awareness, trainings and tests
  • Acknowledgment of IT-strategy

Continuity Manager has to regard the CSFs and to define and implement measures to fulfill the process success.

Continuity Planning

High Level Process Flow Chart

Performance Indicators (KPI)

  • Number of changes to Continuity Plan
  • Number of Continuity Problems

Process Trigger

Event Trigger
Time Trigger

Process is triggered periodical.

Process Specific Rules

Process Specific Rules

  • Each Change to Continuity Plan needs to be documented
  • Each Change to Continuity Plan needs to be communicated

Process Activities

Continuity Plan Definition

Continuity Plan Definition is dealing with definition of

  • Basic Continuity plan

Activity Specific Rules

  • Continuity Manager is set to the person who responsible for Continuity process
  • Continuity Management Team is set persons who are staff involved in Continuity assurance activities
  • The Continuity Plan is defined:
    • add new rules
    • modify existing rules
    • delete expired rules
Continuity Plan Monitoring

Continuity Plan Monitoring is responsible for the monitoring and update of Continuity planning.

Activity Specific Rules

  • Communicate Continuity plan
  • Assure that staff understands and follow rules by auditing and teaching staff
  • trigger updates of the Continuity Plan if necessary

Continuity Design Assistance

Sub process Design in Continuity Management is responsible for the initial planning or planning of optimizations of the Continuity process. If a change proposal on Continuity process is classified the change needs to be planned as well. This is performed by an expert out of the responsible expert group (interface to the Service Level Management). This sub process is triggered by Service Design After the planning of Continuity measurements is finished the status needs to be changed to ‘‘planned.

Process owner is the Continuity Manager. Process agent is an expert assigned by "Continuity Staff".

High Level Process Flow Chart

Performance Indicators (KPI)

  • Number of new Continuity Design Assistance Requests
  • Number of "approved - successful" and "approved - unsuccessful" Continuity Design Assistance Requests
  • Ratio of "approved - successful" and "approved - unsuccessful" Continuity Design Assistance Requests

Process Trigger

Event Trigger

The process is initiated by Change Management

Time Trigger

Process Specific Rules

Process Specific Rules

  • Each Continuity Design Assistance request must be recorded
  • Continuity Design Assistance Agent has to document the request and the result
  • Continuity Design Assistance Owner has to control the agent
  • Continuity Design Assistance requester has to be informed on design status

Process Activities

Design of Continuity Part in Service Design Package

Within this activity the Continuity section of the service design package is designed.

Activity Specific Rules

  • Set the Continuity Design Owner to a member of the IT Continuity Management Staff
  • Set the Continuity Design Agent to a member of the Service Expert or Specialist Group
  • Design Continuity according to the Continuity Plan
  • Coordinate Continuity Design package with the activity "Continuity Design" and other relevant Service Designer
  • Document in the Service Design package and fill out the Continuity Design Record
  • Go to control activity "designed"
Approval of Continuity Design Package

With this activity the Continuity Manager decides on service design package. His decision is based on the cost expectations and the Continuity Definition. In general three results of this activity are possible:

  • Continuity Design Package is finally neglected
  • Continuity Design Package is temporary refused and returned to the Continuity expert, for the improvement or optimization of the Feasibility Study
  • Continuity Design Package is accepted.

Activity Specific Rules

  • Set the Continuity Design Agent to Continuity Manager
  • Approve the Continuity Design Package and Documentation in the Continuity Design Record
  • On approval go to activity "approved - successful"
  • else
    • Go to control activity "new" for a re-design of the Continuity Design Package OR
    • Go to control activity "approved - unsuccessful" for final abortion.

Continuity Transition Assistance

This activity, in cooperation with Change Management, is supporting the implementation and testing of Continuity improvements by designing, testing, implementing and testing the implementation again. This actions are headed by Change Management.

If an Continuity Design Package is authorized and approved by Change Management, all actions functional descriptions and implementation procedures described in the improvement proposal need to be detailed, tested and approved in cooperation with the Change Management. Afterwards the implementation should be assisted to provide help in case emergency or implementation issues.

Final PIR, conducted together with Change Management is also including testing.

High Level Process Flow Chart

Performance Indicators (KPI)

  • Number of Continuity Transition Assistance Request
  • Throughput time (min/average/max) for a Transition Assistance Request until "assisted - closed"

Process Trigger

Event Trigger

Process is triggered by Change Management.

Time Trigger

Process Specific Rules

Process Specific Rules

  • Each Continuity Transition Request must be recorded
  • Continuity Transition Assistance Agent has to document the request and the result
  • Continuity Transition Assistance Owner has to control the agent
  • Continuity Transition Assistance Agent has to coordinate his work with other transition agents
  • Continuity Transition Assistance requester has to be informed on transition status

Process Activities

Creation Risk Analysis and Feasibility Study

If a Continuity transition assistance is requested by Change Management the following 2 artifacts need to be defined:

  • Feasibility Study
  • Risk Analysis

Following aspects need to be addressed within a Feasibility Study:

  • Feasibility of proposal
  • Risk of implementation
  • Risk of neglecting proposal
  • Costs

A Feasibility Study is based on high level planning and should not address detailed planning because of the possibility that the proposed change will not be accepted. Detailed planning of the proposal is part of the Transition sub-process.

A Feasibility Study is provided by the expert team of the address service. Eventually additional requirements need to be regarded that are provided by other assistance processes like Financial, Security or Capacity Management.

After the responsible expert finishes their contribution to the planning of the security actions, the status needs to be set on planned design package

Activity Specific Rules

  • Set the Continuity Transition Owner to a member of the IT Continuity Management Staff
  • Set the Continuity Transition Agent to a member of the Service Expert or Specialist group
  • Create a Continuity Feasibility Study and Risk Analysis according to the Change Requirements and the Continuity Plan
  • Coordinate the creation of the Continuity Feasibility Study and Risk Analysis package with others
  • Document Continuity Feasibility Study and Risk Analysis in the Continuity Transition Record
  • Go to activity "created"
Build - Test - Implement - Assistance

If an Continuity change is approved for the implementation Change Management is assigning Continuity Management respectively the sub-process Build - Test - Implement - Assistance for

  • Detailed definition of implementation instructions for the Continuity improvement,
  • Detailed definition of test procedures and documents for the Continuity improvement,
  • Support of implementation,
  • Testing of implementation and
  • Approval of the implementation

For all activities above detailed documents are necessary.

Within the documentation of the improvement proposal, the order and time line of actions need to be described. Testing documentation hast to address the test design and assure effectiveness of the test.

Implementation activities are fulfilled and headed by Change Management - Continuity Management is only assisting and supporting regarding the Continuity aspects and functions. Continuity Management can be defined as agent by Change Management for some implementation steps.

The Test is split up in two main test areas:

  • 1. Test of the implementation - "Does the provided document describe the right implementation actions in the right implementation time order?"

If this test is positive, then

  • 2. Test of the functionality of live-system - "Does the system perform the functions defined in Design sub-process?" This test in performed based on the defined testing documentation. This test is addressing the test design and the test effect.

If both testing areas (see above) are positive, the data set is set in status enabled. The Change Management needs to be informed on the positive result of the testing and the data set status needs to be changed to waiting. In case of negative test results, the testing can be aborted (test status needs to be set on canceled) or transferred again in the activity consolidation.

This activity is performed by expert staff assigned by Continuity Staff. Activity owner is the Continuity Manager

Activity Specific Rules

  • Support creation of implementation plan including fallback plan
  • Support creation of test plan
  • Support test of implementation including fallback plan
  • Support implementation
  • Document test and implementation results
  • Coordinate work with Change Management
  • Go to activity "assisted"
Evaluation and Closure Assistance

In coordination with the Post Implementation Review of the Change, Continuity Management helps to test the implementation from the continuity point of view. In cases of failed tests, the Change Management has to decide if the fallback plan has to be executed or the implementation can be accepted despite any issues in testing.

Activity Specific Rules

  • Support post implementation review and test
  • Consult Change Management on Fallback Execution
  • Support fallback implementation if necessary
  • Document activities
  • Go to activity "assisted - closed"

Continuity Verification

This activity is performed by the Continuity Manager. The target of the activity is to verify, if implemented, that Continuity optimization fulfills the planned results - Verification uses the "Four-Eye Principle".

If actions do not provide the results planned, the status can be set to re-designed. Planning activity needs to be started again. If it is decided that a Continuity improvement action is not approved, then the status should be set to cancelled. If the Continuity Management approved that decision, the status is being set to OK. In the perspective of Continuity Management, these actions are approved and can be implemented once the Change Management has authorized the actions.

High Level Process Flow Chart

This chart illustrates the Continuity Management Verification process and its activities

Performance Indicators (KPI)

  • Number of Continuity incidents per audit by breach level
  • Number of external Continuity audits
  • Number of internal Continuity audits
  • Top Ten services with the most Continuity incidents

Process Trigger

Event Trigger
  • Any request for a Continuity Audit
    • from any Process Manager or Process Owner
    • from any Service Owner
    • from Senior Management
    • from Customer and Customer Owner (Account Manager)
Time Trigger
  • Continuity Audits are often time triggered and processed on regular base. Regular Continuity Audits are defined in
    • Continuity Plan
    • the Process Description
    • the Service Description
    • Servie Level Agreements (SLA)
    • Operational Level Agreements (OLA)
    • Underpinning Contracts

Process Specific Rules

Process Activities

Planning of Verification Activities

This activity plans the audit activities. Issues to be addressed by the planning:

  • Which services need to be audited regarding Continuity issues?

This depends on the number of Continuity incidents and the severity of Continuity incidents per service as well as the importance of a service.

  • What is the scope of the audit?

What is the extend of the audit: full audit or just the check of few indicators?

  • Auditing method?

Audit to be fulfilled by checks, real life tests or just an questionnaire?

  • How often is the audit needed?

This depends on the number of Continuity incidents and the severity of Continuity incidents per service as well as the importance of a service and how exposed a service is to external and internal threads.

  • Who is performing the audits?

Auditors need to be alternated often to assure that the auditor is independent, not influenced by options or customer relationship and by the will to keep the auditing contract.

Result of this activity is an audit plan, that need to be communicated.

Activity Specific Rules

  • set Continuity Verification Agent to member of IT Continuity Management Staff
  • set Continuity Verification Owner to Continuity Manager
  • set Continuity Verification Requester
  • create Continuity Verification Description
  • select Continuity Verification Auditor according to the Continuity Plan
  • document Service affected by the Continuity Audit
  • document Configuration Items affected by the Continuity Audit
  • document Customer affected by the Continuity Audit
  • document User affected by the Continuity Audit
  • document Expert/Specialists affected by the Continuity Audit
  • create audit plan
  • go to control activity "planed"
Performing Verification Activities

Based on an audit plan, the audit is performed by the Continuity Verification Auditor

Activity Specific Rules

  • set Continuity Verification Agent to Continuity Verification Auditor
  • set Continuity Verification Owner to Continuity Manager
  • documet result in Continuity Verification Record
  • go to control activity "performed"
Review of Verification Results

Results of the audit need to be checked. If Continuity incidents occur these need to be classified in severity breach levels and handled by the Process Manager. In case of minor changes the Change Management is addressed, in case of major changes, the process is started with a redesign of new design of e Continuity Package.

Activity Specific Rules

  • set Continuity Verification Agent to Continuity Verification Auditor
  • Evaluation & classification of each Continuity Incident
  • trigger Incident, Problem or Change Management where necessary
  • Document result in Continuity Verification Record
  • go to control activity "reviewed"