Disaster Recovery

Sumanth Kumar
3 min readFeb 3, 2021

Disaster is any unplanned incident that will damage the software and can disrupt the business. It can be coming from anywhere like a natural disaster, internet outage, power outage, compromise of the systems. In any of these cases, businesses heavily dependent on software can stop working if there is not enough plan to recover from such situations.

Disaster Recovery is a process to bring back the software with minimum or no loss to the business. Given it’s criticality who doesn’t want to recover from disaster without any damage but is it possible without any preparation? I would certainly say NO.

By failing to prepare you are preparing to fail - Benjamin Franklin

  1. Let’s say data backups are taken everyday, but are we keeping them in some other geographical region, maybe we should keep them in another cloud account? If not, there is a high chance disaster can happen to both running and backed up data together.
  2. Is everyone in the team capable of bringing back the whole system? At least a few? If not, what can we do when the most context holding person is not available ?
  3. Imagine we successfully got back our systems, but what if the teams integrated with us can’t use our new software? Integration issues are the most common reason for software failures, if we haven’t tested before, assume it’s not gonna work.

There are more such examples and situations which can deepen the business loss because of disaster and we need to prepare well in advance to recover as quickly as possible

How to prepare for disaster recovery:

We need to understand two things before we jump on to start working on disaster recovery and both of them are measured in time.

  1. Recovery Point Objective (RPO): This defines how close we can recover the data from the time of disaster. This should cover the maximum amount of data that business can afford to lose.
  2. Recovery Time Objective (RTO): This defines the maximum time to recover for a business to work without any impact after disaster.

For example if the business says RPO is 5 minutes and RTO is 10 minutes. Then we need to backup data every 5 minutes and we should make sure within 10 minutes we can bring back our software after any disaster.

Achieving RPO and RTO aren’t that easy unless there is enough effort put in from the team on some of these aspects.

  1. Automation: Automation plays a key role in speeding up things. When it comes to disaster recovery, it is the most important thing that we need. Invest time in automating infrastructure and practice automated deployments. When a disaster happens just be a button click far from recovering software. It is highly difficult for a team to recover from disasters who do not follow automation rigorously.
  2. Frequent disaster recovery tests: Depending on your software and with whom all it is integrated, it is highly advisable to do frequent disaster recovery tests and make sure integration with other teams are working after recovering software. It is better to have a different environment for disaster recovery and do the tests on it and try to integrate with other teams. A lot of times many configurations can become more specific to the environment and when they are deployed in completely different servers and regions integrations might break and doing it frequently is the only way to mitigate this issue.
  3. The more people, the better: Always make sure there are enough people in the team who can do disaster recovery. In hard times it is important that anyone in the team is capable of recovering the software. This will also remove knowledge gaps and push everyone’s boundaries to go beyond and understand software in and out.

Conclusion:

As software development teams, we always discuss performance, maintenance and releasing products but if we fail to discuss unexpected incidents like disasters it can impact business to a great extent and one such incident can take away a huge amount of data, money and our most valued customers.

In case, if you haven’t thought through disaster recovery yet, this is the time to go and speak in your team and it is strongly recommended disaster recovery plans should be part of one of the non functional requirements for any software to be developed and released.

--

--