Setting up Disaster Recovery

Disaster Recovery (DR) involves a procedure to enable the recovery or continuation of system following a disaster.

Harmony disaster recovery utility is used when there are requirements for users to have a DR setup, when the primary data center fails to recover the disaster recovery setup running on a data center.

Initial Primary and Disaster Recovery Site Configuration

During the Disaster Recovery process, the following process is performed to ensure recovery:

  • Setup Primary and DR Harmony Controller with similar hardware, software configuration.
  • Configure SSL certificates for FQDN and email server information using A10 Harmony Controller Operator Console in the Certificate section under Configuration Management.
  • Perform one time setup on both the setup.
  • Create password-less authentication as indicated in the Setup Procedure.
  • Schedule the period of backup bundle to synchronize with DR setup.

Note: In case of multi-master environment when node0 is down, Harmony Controller will not go down. But the disaster recovery utility stops working until node0 is operational.

The diagram below shows prior to restore:

_images/dr01.png

The diagram below shows after restore:

_images/dr02.png

Scheduling data transfer to Disaster Recovery Site

Pre-Requisites

For an active or passive disaster recovery (DR) configuration, the following considerations are essential:

  • The node0 where the periodic backup is performed, storage consideration over and above the recommended configuration is provisioned. In cases where metrics are stored, the storage is provisioned twice the recommended capacity on node0. This requirement is applicable for node0 only. For appliance systems, this is built-in, hence no action is needed.
  • The access for NFS ports should be enabled between the nodes.
Port Numbers
Description
111 (TCP and UDP), 2049 (TCP and UDP)
NFS Server
1110 TCP
Cluster
1110 UDP
Client Status
4045 (TCP and UDP)
NFS Lock Manager

Setting up FQDN

  • Configure SSL certificates for FQDN and email server information using A10 Harmony Controller Operator Console in the Certificate section under Configuration Management.
  • FQDN should be mapped to NodeZero IP address.

Setup Procedure

  1. Assuming the primary Harmony Controller is already setup, deploy the DR Harmony Controller with similar hardware, software version. After deploying with IP addresses, ensure to import the SSL certificate and configure FQDN with Operator Console.
  2. Run dr_setup.sh in both primary and DR Harmony Controller for a one-time setup.

Example:

dr_setup.sh --esreponame='esharmony' --envsetuptarfile=/opt/envsetup.tar.gz

Where the envsetuptarfile is the path where the Harmony installation files are located.

  1. Key-in the auth type.

    For password-less: Add the existing public key (id_rsa.pub) of source machine to target machine’s /home/<user>/.ssh/authorized_keys. Please ensure to note down the user information for the key being used.

    For Key based: Generate ssh key pair using ssh-keygen command and provide a name to the key (login with .pem file) Copy the public key to target machine’s authorized_keys (ssh-copy-id -i ~key.pub root@192.168.21.2) Verify by performing ssh using private key from source to the target machine (ssh -i key.pem root@192.168.21.2)

  2. Configure the crontab on the source machine with syntax usage.

Example:

00 * * * * /root/a10-harmony-controller-4.2.0/utilities/harmony_backup.sh --auth_type=passwordless  --remotehost=54.87.207.20  --remoteuser=admin

The system is setup for creating periodic backup bundle and transferring to the target system:

cd /home/admin/a10-harmony-controller-4.2.1/utilities/harmony_backup
./harmony_backup.sh -h

Best Practices of Disaster Recovery

List of best practics to be followed during disaster recovery:

  • Periodical verification of the logs to ensure the operations are working without any issues.
  • Perform drill to ensure the DR process is fully working along with the process.
  • Create and maintain the Standard Operation Procedure (SOP).

Scheduling Data Restore at Disaster Recovery Site

The restore utility can be run periodically for example every day to update the latest available data in the DR system:

./harmony_restore.sh -h

The disaster recovery utilities are available under the ‘utilities’ directory of Harmony Installation directory. The backup and recovery are logged in the respective system under /var/log/HC_BACKUP & /var/log/HC_RESTORE.

Example 1: Restore from remote storage system without metrics data:

./harmony_restore.sh  --remoteuser=admin --remotehost=192.168.10.10 --remotelocation=/home/admin/ --metrics=no --auth=passwordless

Example 2: Restore from specific backup file for advanced users:

./harmony_restore.sh --configds0=/home/admin/cds0.tar.gz --tdm=/home/admin/20181030_123045 --schemaregistry=/home/admin/acos_backup.tar.gz --metricsdatastore='esharmony_2019-01-24-16:00:01'. –selectiverestore=’true’

Procedure at the Time of Disaster

Pre-Requisite

If any of the nodes are operational, bring down all the nodes.

Procedure

When the primary Harmony Controller is not available, the following steps are performed to recover the DR Harmony Controller:

  • Login to the DR Harmony Controller and follow the SOP, which includes running or restore utility and run the harmony_restore.sh. Ensure the restore operation completes without any error.

Example:

/root/a10-harmony-controller-4.2.0/utilities/harmony_restore.sh
where the path refers to the installation directory of Harmony Controller.
  • Change the DNS record for the FQDN to point to the new IP address.
  • Verify the devices are registered automatically in the DR system and all operations are working satisfactorily.
  • The metrics data may take time to have the historical data showing up in the dashboard, if the dataset is of large size.
  • Enable the backup in the DR Harmony Controller to create backup bundle from the new system. Follow the same process used while setting up primary Harmony Controller. Although the remote system is not available, it performs the backup bundle push when the system is available.

Note: Ensure to remove the crontab on the primary Harmony Controller when it is brought up to avoid backup bundle mismatch between two systems when the synchronization happens from DR Harmony Controller.

Procedure when Primary Site is Back

Pre-Requisites

  • Ensure to remove the backup configuration from the crontab.
  • Allow the new data sets to be synced in the primary system.

Procedure

When the primary site is back, bring up the Harmony Controller, and perform the following steps:

  • For switchover from DR to primary, perform a restore operation with new backup bundle on the Primary system. This will update the data points to current.
  • Bring down the DR system.
  • Peform the DNS switchover for the FQDN to the IP address of the primary system. The devices will be connecting back to the primary system. Ensure all operations are satisfactory
  • Configure the automated backup in crontab of the primary system.
  • Bring back the DR setup and ensure to remove the crontab entries in the DR setup.

Note - Ensure the backup direction is from primary to DR setup.

Disaster Recovery Metrics

The utility does rotational retention based on the configuration. By default, six last backups are retained. The directory structure is synchronized with the target system, either disaster recovery Harmony Backup (or) optional storage server, for the period configured in the crontab, by default one hour.

The metrics can be of high volume depending on the deployments and hence, it may take some time for the metrics database to populate and show in the dashboard. Typical metrics recovery time is 3GB per minute.

Recovery Point Objective (RPO) is the duration of time the amount of data is lost and Recover Time Objective (RTO) is the duration of time the service is restored.

The Disaster Recovery Metrics for Harmony Controller:

  • In case of hourly backup, the RPO is maximum 1 hour.
  • Service recovery time for:
    • Powering up the DR system to service
    • Response to the issue
    • Restoration of backup
    • Moving the FQDN.

For example for a RTO:

  • 10 minutes to startup
  • 10 minutes to respond
  • 5 minutes to recover from backup
  • 5 minutes for FQDN switch over

Total time is 30 min.

Advanced Back-Up Options

Here are a few examples of advanced back-up options and with the arguments supported:

Examples:

./harmony_backup.sh  --remoteuser=admin --remotehost=192.168.1.13 --remotelocation=/home/admin/ --auth=passwordless

./harmony_backup.sh  --remoteuser=admin --remotehost=192.168.1.13 --remotelocation=/home/admin/ --auth=passwordless  --metrics=no

./harmony_backup.sh  --remoteuser=admin --auth=keybased --accesskeypath=/home/admin/key --metrics=no

./harmony_backup.sh  --auth=aws --bucketpath=<path of bucket>

retention: This refers to maximum number of backups to retain. Default vaule is 6.

auth: passwordless|keybased|aws

  • If you enter passwordless, then you have to provide remoteuser, remotelocation and remotehost
  • If you enter keybased, then you have to provide accesspath, remoteuser, remotelocation and remotehost
  • If you enter aws, then you have to provide bucketpath, ACCESSKEY and SECRETKEY will be taken from aws-secret. Make sure keys under aws-secret has access to upload file to s3.
Argument
Description
remotelocation
Harmony Controller backups are stored on remote server at this location. Default location is /a10harmony/harmony_backup
remotehost
Hostname or IP address of Secure Copy Protocol (SCP) server
accesskeypath
Path of key file (For example key.prem)
remoteuser
SCP User Name
bucketpath
Refers to s3 bucket path where backup will be moved.
force
This will first run backup on Harmony Controller components
metrics
If –metrics=no, then this utility will not take metrics backup. Default value is yes.
metricssnapshot
Metrics snampshot name. Default value is esharmony.
metricsrepo
Name of repository registered in elastic search. Default value is esharmony.

Advanced Restore Options

Here are a few examples of advanced restore options and with the arguments supported:

Examples:

./harmony_restore.sh  --remoteuser=admin --remotehost=192.168.10.10 --remotelocation=/home/admin/ --auth=passwordless

./harmony_restore.sh  --remoteuser=admin --remotehost=192.168.10.10 --remotelocation=/home/admin/ --metrics=no --auth=passwordless

./harmony_restore.sh

./harmony_backup.sh --metrics=no

./harmony_restore.sh -–selectiverestore=’true’ --configds0=/home/admin/cds0.tar.gz --tdm=/home/admin/20181030_123045 --schemaregistry=/home/admin/acos_backup.tar.gz --metricsdatastore='esharmony_2019-01-24-16:00:01'

./harmony_restore.sh -–selectiverestore=’true’ --configds0=/home/admin/cds0.tar.gz --tdm=/home/admin/20181030_123045 --schemaregistry=/home/admin/acos_backup.tar.gz
Argument
Description
configds0
Path of Config datastore 0 backup file
configds1
Path of Config datastore 1 backup file
configds2
Path of Config datastore 2 backup file
tdm
Path of TDM backup file
schemaregistry
Path of metrics ingestor backup file
metricsdatastore
Name of elastic search snapshot
remotehost
192.168.10.10
remotelocation
/home/admin
localpath
harmony_backup directory path. Deafault value is /a10harmony
remoteuser
admin (rsync user)
s3bucketpath
s3://backups/harmony_backup
auth
passwordless|keybased|aws
accesskeypath
.pem file path to access remotehost
metrics
Deafult is yes.
selectiverestore
Deafult value is false. If you are restoring from the specific backup file, set the value to true.
metricssnapshot
Metrics snapshot name. Default value is esharmony.