Pray for Better, Prepare for Worse

All things mechanical will fail. A lack of sound disaster recovery procedures should keep a knowledgeable IT administrator awake at night. Measures to prevent data loss are needed by many recovery scenarios and are a worthwhile vehicle to discuss the overall need to practice disaster recovery procedures.

Data backups are a key component of disaster recovery. Recovering from the failure of a complex system requires planning and training. IT administrators and operators should not be comfortable with simply deploying backup software. IT operators need to become comfortable only through continual practice of recovery procedures. They will also be better prepared to carry them out under the pressure of a real system failure. Recovering data will become less of an exceptional task and more routine.

Having IT operators routinely perform backups and recoveries will allow the operators to check the backup hardware. Checking the recorded data and verifying its recovery are necessary steps toward preparedness. Checking that data from older systems can be recovered onto newer systems may also be important. IT operators should be practicing different types of recoveries to verify that data can be recovered for different scenarios.

The routine practice of backup and recovery procedures also verifies the completeness and correctness of the recovery plans. It also provides the opportunity to measure their effectiveness and performance. Key measurements include the time needed for recovery as well as the amount of data lost between the most recent backup and the point of failure. Repeated validation of the procedures will provide opportunities for refinement. Special cases in the recovery process should be minimized for each system. Minimizing both time for recovery and loss of data can be worked on indefinitely.

Plan. Deploy. Practice. Repeat.

Technology that enhances disaster recovery preparedness continues to evolve. For example, file systems that help provide consistent snapshots for backups are being deployed. Adopting such systems may complicate recovery procedures. Trouble with a Logical Volume Manager when performing a bare metal (“nuke”) restoration with a rescue CD is a possible problem that may be discovered only through practice.

Leave a Reply