Hot, Warm, Cold – DR & HA Strategies for IBM i
Does your company have a Disaster Recovery Plan?
Most people confidently say, “Yes, of course!”
From my experience of visiting hundreds of clients over the last 30 years, most companies DO NOT have a solid DR plan. If they have a plan, it is not documented, updated, nor tested annually. If you ask three different people their definition of a Hot, Warm, or Cold site, you will surely get three different answers.
These two important acronyms are your starting point: RPO and RTO: If and when your business experiences an unplanned outage you will do your best to have a Recovery Point where the Production machine kicked the bucket. Unfortunately, if you restore from the last tape backup that was created 48 hours ago, your Recovery Point is 48 hours. If your boss has an RPO (Recovery Point Objective) of 30 minutes, then you are in the hot seat! Ask your boss his/her expectations of RPO try to clue him/her in on the type of backups you currently have. The production system is down, you have the tapes that were created 48 hours ago. Let’s assume for this scenario the system outage was due to a hardware failure and it took 1 day to get it fixed. Hardware is now fixed and you begin to load up your tapes and IPL the machine; this takes you through the night. The system has been out of commission for 24 hours; this is your Recovery Time. If your boss has an RTO (Recovery Time Objective) of 2 hours, you are now dusting off your resume. Your best contingency plan is to communicate the existing Recovery Point and Recovery Time with your management. Find out what their expectations and needs are for the business. Offer them the VERY BEST options, and keep records of your suggestions. These two acronyms will determine what type of DR strategy you end up with; Hot, Warm, or Cold. The shorter required RPO and RTO, the higher the cost due to more infrastructure, resources and overhead.
Let’s next address the differences between Hot, Warm, and Cold.
Hot Site: This is the ultimate; and the most expensive. Most people mistakenly call their secondary site a “Hot Site”. These terms sound good to those upper level execs who doled out the bucks for your contingency plan. However, unbeknownst to the C-level folks your Hot Site may not be so hot! A true Hot Site will have a steady duplication of your production data, machine settings, application software, network equipment, and servers; running, tested and updated regularly. A Hot Site will be humming along like spare engine awaiting to run your business rapidly in the event of a failure at the main production site. (Notice I did not call this HA). This is a basic DR Hot Site with two nodes; (1) Production & (2) DR. By adding a third node (including server & storage, etc.), you would then have HA or High Availability. HA equipment typically resides next to Production and is used primarily for planned outages, like updates to the Production System in a 24×7 shop. The HA server allows a business, with 24×7 requirements, to have zero downtime and more flexibility in order to keep all systems up to date.
Even if you have a Hot Site, you could lose the transactions “in flight” over the data lines even if your systems are being replicated. This gotcha is why many companies consider adding a VTL (Virtual Tape Libraries) to their environment. A VTL solution will allow your system to backup to solid state storage. Backups are much faster as well as restores. A VTL can reduce your RPO and RTO dramatically.
Warm-Site: This solution is typically chosen by clients that can be down for longer periods of time. The boss has calculated that the business will not “go under” and can take more time putting humpty dumpty back together again. A popular method for Warm Site is to simply have a Virtual Tape backup to the cloud. The data is stored on a storage appliance remotely. When a disaster has been declared, the Warm Site personnel restores the virtual tape backup to another Power System. Your business remotely accesses the restored system and is back to work, but possibly in a limited sense. As you can imagine, the Warm-Site could have some missing links to your network environment. Most are no longer simple stand alone AS/400’s with 5250 dumb terminals; (yes I’m that old). Even though a Warm Site may take longer to restore, the cost is lower and may be the best fit for many small businesses.
Cold-Site: This is bare bones simple and lowest cost. Cold, as in four concrete walls with nothing in it other than power receptacles. IBM offers some Cold Site plans that include a list of hardware that you have mutually agreed upon. There is a hefty monthly cost for this Cold Site service and it has the longest RPO and RTO. However, it can be approved by an auditor and insurance company.
We have assisted hundreds of clients with recovery plans for their IBM Power environments and other environment, and would welcome the opportunity to learn about your business needs. Together we can devise a plan that fits your budget and meets your business requirements. Give us a call at 800-771-7000 or submit a request online!
By Steve Marinak – VP, Midrage Business Unit