Beep, beep, beep…oh super, your UPS has been tripped by yet another power outage; but this time, the power didn’t kick right back on. You call your local electric company and they give you a restore estimate of 5-7 hours. While it may be tempting to pack up and call it a day, business requirements still need to be handled. At other times, systems may need to be patched or updated during business hours – in order to prevent an active security breach or massive malware invasion perhaps. These updates would be an example of planned system outages.
During either of these possible scenarios, communication is a key part of quick systems restoration during an outage. So, do you have a process in place to communicate the next steps to take during an outage, while still ensuring business continuity? If not, gather answers to the following questions in this checklist to communicate an effective plan of action that keeps your business up and running, in the event of a planned or unplanned outage.
1. Request or Communicate the Outage to Management
For a planned outage (prior to the outage), it would be ideal to forward an email or similar correspondence to management that roughly follows this template:
“Service/Product” Outage Alert
- Description: Provide a brief service or product outage description and/or list what hardware and software will be affected.
- Start Time: Include full date and time.
- Outage Duration: How long will the service or product be unavailable and when should it be back online? (Example: 2:00 AM to 3:00 AM)
- Purpose: State briefly – why are you upgrading or doing maintenance, or what is causing the systems to be down or offline?
- Impact: List which users will be unable to access which services during the outage.
- Detailed Impact: Provide more information on specific applications or services that may be unreachable during this outage. For instance, if more than one application or service will be affected, please specify.
- List Services Not Affected by Outage: Communicate clearly if this downtime will not affect the users’ ability to access certain services for the duration of the outage, and list what services will still remain intact and running normally.
- Provide Contact for Questions: i.e. “As this e-mail originates from an unmonitored email address, please contact name, at (xxx) xxx-xxxx or firstname.lastname@example.org for more information.”
2. Impact Analysis – Evaluation
It’s important to determine how the outage will affect all systems, and the way in which those systems interact with one another. For instance, is this a mission-critical application that runs continuously and will affect all users, or is it one like a backup job, which runs only at night and/or after hours? Also, you will need to identify whether these systems interact in a way that a delay in the availability of one causes a chain-reaction delay in the next, the one after, and so on – like a group of batch files that run consecutively.
3. Impact Analysis – Summary
This segment should provide a summary of all the detailed items identified in the evaluation, in order to better understand the impact of each application or service that will be shut-down during the outage.
4. Detailed Planning for the Outage
This section includes the following pieces:
- Arranging for staff during the outage period and the time afterwards until all operations are running normally
- Handling backups: request an additional backup before planned outages, or obtain the most recent backup after an unplanned outage
- Turning off any monitoring alarms during the outage, if applicable
- Extending communication of the outage to customers, partners, and vendors if necessary
- Properly communicating the outage and its effects on all systems to all internal staff members
5. Detailed Plan of Action
This area is comprised of a list of all actions that will need to be undertaken during the outage, including the complete length of time for the outage, impact analysis, and notification plan. More or less, this will be a very specific list of steps to take that have been compiled from the previous checklist items.
6. Management Approval and Sign-Off
Finally, once all other planning details are finalized, you need to have all affected business departments review and approve the outage plan. In your organization, these approvals may include Change Management, IT Management, Operations Management, and any others that will be affected by changes or outages to company systems. These approvals – with additions or corrections – may come in the form of change request forms or via approval meetings.
Whether your outage is planned or unplanned, having a process in place to help the corporate systems run properly leading up to the outage, during it, and after any potential changes, will enable smooth sailing while riding the choppy waves that can come tethered to system outages.Tags: IT Infrastructure