Incident management

Everyone will have an incident or continuity event to deal with sooner or later. It might be minor and hardly noticed , but it could be major and felt by the business and its clients, so it's worth having some guidelines in place as part of your wider continuity plan.

 

Firstly, don't let perfect get in the way of good. Some kind of plan, any plan, is better than no plan at all.

 

IT&E recommend having an area specified as the incident management hub. This could be as simple as a flip chart in the corner, or a whiteboard on the wall, or even a messaging tool. But the important thing is that people know what it is, where it is located and how to access it.

 

For those at the sharp end, dealing with resolving the issue, you should start with three headings: 1 - What do we know? 2 - What have we done? 3 - What do we need to do? These sections will then be populated as tasks are completed and information is surfaced.

 

Although it's the last thing you are going to want to do when in the middle of an incident, it is important that you remember to communicate with internal and external stakeholders.

 

IT&E recommend having another set of headings ready to be populated: 1 - What is the problem? 2 - Why is it a problem for us or our clients? 3 - When did it start to impact us? 4 - How are we going to resolve the issue? 5 - When will that fix be in place?

 

An example scenario:

Resolution

 

1 - What do we know? = Several users have reported that they cannot open files because they are encrypted.

 

2 - What have we done? = Put steps in place to halt or slow the spread of the encryption and try to locate the source.

 

3 - What do we need to do? = Confirm the source and that the spread has indeed be halted. Make sure the environment is stable and safe enough to restore our data and get back to work.

 

Communication

 

1 - What is the problem? = The encrypted files have stopped users from being able to amend documents and the isolation of the affected servers and storage as a countermeasure has stopped any new documents from being centrally produced.

 

2 - Why is this a problem? = Document production has ground to a halt, this has left 50% of staff unable to do their job. This has already impacted workflow and deadlines.

 

3 - When did it start to impact us? = The alarm was raised just after we opened for business this morning. Initial investigations show that the encryption started in the middle of the night, meaning that the encryption has been running for several hours.

 

4 - How are we going to fix it? = The infected machine(s) have been taken off line and quarantined, and area has been created to restore the affected files from backup. There will undoubtedly be some work lost as a result of the gap between taking the backup and restoring the files, so extra measures have been put in place to reconstitute the files and get the workflows back online.

 

5 - When will the resolution be applied? = The files and workflow should be restored by 1600 today.

 

With operations restored, there needs to be some time set aside for going over the incident and learning any lessons that will help... next time...