Disaster strikes Cloud Direct and we survive to tell the tale
On 5th October 2011 the On Direct sales and service offices, based in Bath, were subject to a power outage lasting nearly half a day. We thought that we would summarise our experience and share our learning points to help you prepare for your own next disaster.
Note: None of this scenario applied to or in any way impacted the security, performance or availability of our numerous geographically dispersed storage data centres.
The power to the entire managed building was cut for a seemingly indefinite period - affecting a number of other businesses based in The Tramshed - and without knowledge of what caused the outage. Without a failover power supply, all voice and communications services within the building were at the mercy of the local electrician (with an estimated call out time of three hours). So with an important customer base, and award winning service levels to maintain, what do you do?
Our network infrastructure incorporates a number of desktop machines in addition to laptops, which afforded us a few hours of access to key software and offline applications. We use cloud-based SharePoint to manage our business content, and each employee maintains a local cache of all business critical content, as well as document libraries that are relevant to them. This proved especially helpful for things like phone lists and key service documentation. The majority of our systems are cloud based, which present many advantages over their locally installed rivals, assuming that you have connectivity. We only had a handful of laptops that required a connection, so we were able to survive from a few wireless hotspots, ran from a couple of smartphones.
With machines and connectivity live, our backend support was operational. However we still needed to be able to get in touch with our customers, which meant accessing our VoIP phone system. With basic internet access, we were able to logon to the Voice Direct management portal and redirect business numbers to employees’ personal lines and mobile phones.
After an hour without power and with no sign of recovery, a management decision was taken to send key users to work from home-office locations. With cloud based systems our employees are able to work from anywhere at any time. Where possible we sent users with their VoIP handsets as used in the office environment. These are preconfigured devices, that when powered and online are able to work from anywhere, which affords easy business continuity. Some users were able to download the VoIP management toolbar, which enabled them to configure their VoIP account to work with their personal mobiles or landlines. If, with hindsight, we knew that we would be without power for such a long time, we would have sent more users home earlier, so that the first batch could be up and running in time to let the second batch return home sooner. With a strong staff presence onsite however, it enabled us to monitor inboxes and review our VoIP records and make sure anybody who had attempted to get in touch with us had either been successful or received a call back.
Once the second hour had passed and with laptop battery meters slowly fading we resigned ourselves to an extended outage and began to wind down the office operation. Logistical arrangements were made for the remaining staff, and a final meeting was held to allocate roles and responsibilities. With employees in fire-fighting mode, key everyday activities were at risk of being overlooked so this stage proved to be particularly important. The success of this task will depend on who is involved in the meeting; we aimed for a mix of front-line employees in addition to managers to ensure all bases were covered.
As the final members of staff were preparing to leave, the bulbs and servers whirred back into life and continued to function uninterrupted for the rest of the day. Whilst everyone suddenly found themselves with a hundred things to do to make up for lost time, we were close to forgetting that each disaster action required reversal to resume business as normal. With hindsight this review was not completed fully or soon enough once power was functioning again. The review to delegate responsibilities for key tasks paid dividends here since we could remain confident that all activities were being addressed.
Key Learning Points
Having all our systems hosted in the cloud and enterprise-class made light work of this disaster. Our core systems continued to work – and core staff could access them from alternate locations.
However, hosted systems require access to them – and VoIP phones depend on this too, so losing power to our routers and phone caused unnecessary inefficiencies on the invocation process.
To ensure this does not happen next time, we are installing Universal Power Suppliers to ensure key routing and phones continue to work during the outage. With this in place, the same steps would largely be taken, but perhaps in a calmer manner.
Whilst our disaster scenario lasted nearly half a day, there was no adverse impact on our ability to provide service and support. The enterprise class infrastructure that runs the service continued to function seamlessly.