Tech-eye-Tech: Netapp Metro Cluster DR Test

Friday, February 8, 2013

Netapp Metro Cluster DR Test

Metro Cluster Disaster Recovery Test

Description	To test Disaster Recovery, you must restrict access to the disaster site node to prevent the node from resuming service. If you do not, you risk the possibility of data corruption.
Failover Procedure	1. Stop ISL connections between sites. Switch-Site-01and Switch-Site-02 are located in disaster site, where filer FilerA is located. Network to login to the above switches and need to disable the ISL ports. Note: Once ISL is stopped then automatic failover capability is disabled. The filer FilerB cannot takeover FilerA filer. 2. Once the ISL is stopped then issue the below command on filer FilerB and see the disk of FilerA in failed state. The same can be executed on FilerA filer as well. FilerB> aggr status -r Aggregate aggr0 (online, raid_dp, mirror degraded) (block checksums) Plex /aggr0/plex0 (online, normal, active, pool0) RAID group /aggr0/plex0/rg0 (normal) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------------------------------------------------------------------------------------- dparity SITEA03:5.16 0b 1 0 FC:B 0 FCAL 10000 272000/557056000 280104/573653840 parity SITEA02:5.32 0c 2 0 FC:A 0 FCAL 10000 272000/557056000 280104/573653840 data SITEA03:6.16 0d 1 0 FC:B 0 FCAL 10000 272000/557056000 280104/573653840 Plex /aggr0/plex1 (offline, failed, inactive, pool1) RAID group /aggr0/plex1/rg0 (partial) RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks) --------------------------------------------------------------------------------- dparity FAILED N/A 272000/557056000 parity FAILED N/A 272000/557056000 data FAILED N/A 272000/557056000 Raid group is missing 3 disks. FilerB> Connect on the Remote LAN Management (RLM) console on site B. Stop and power off the NetApp controller. FilerA> halt Boot Loader version 1.2.3 Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002-2006 NetApp Inc. CPU Type: Dual Core AMD Opteron(tm) Processor 265 OK> RLM- FilerA> system power off Login to FilerB filer and execute force takeover. FilerB> cf forcetakeover –d FilerB (takeover)> aggr status -v ---- FilerB> (takeover)> FilerB> (takeover)> aggr status -v Aggr State Status Options aggr0 online raid_dp, aggr root, diskroot, nosnap=off, mirror degraded raidtype=raid_dp, raidsize=16, ignore_inconsistent=off, snapmirrored=off, resyncsnaptime=60, fs_size_fixed=off, snapshot_autodelete=on, lost_write_protect=on Volumes: vol0 Plex /aggr0/plex0: online, normal, active RAID group /aggr0/plex0/rg0: normal Plex /aggr0/plex1: offline, failed, inactive FilerB/ FilerA> aggr status -v Aggr State Status Options aggr0 online raid_dp, aggr root, diskroot, nosnap=off, raidtype=raid_dp, raidsize=16, ignore_inconsistent=off, snapmirrored=off, resyncsnaptime=60, fs_size_fixed=off, snapshot_autodelete=on, lost_write_protect=on Volumes: vol0 Plex /aggr0/plex1: online, normal, active RAID group /aggr0/plex1/rg0: normal
Failback Procedure	1. After testing enable the ISL ports. Login to Switch-Site-01and Switch-Site-02 switches and enable the ports The plexes which are in FilerA will be out dated state, this needs to be synchronized manually. FilerB/FilerA> aggr status -v Aggr State Status Options aggr0(1) failed raid_dp, aggr diskroot, raidtype=raid_dp, out-of-date raidsize=16, resyncsnaptime=60, lost_write_protect=off Volumes: Plex /aggr0(1)/plex0: offline, normal, out-of-date RAID group /aggr0(1)/plex0/rg0: normal Plex /aggr0(1)/plex1: offline, failed, out-of-date aggr0 online raid_dp, aggr root, diskroot, nosnap=off, raidtype=raid_dp, raidsize=16, ignore_inconsistent=off, snapmirrored=off, resyncsnaptime=60, fs_size_fixed=off, snapshot_autodelete=on, lost_write_protect=on Volumes: vol0 Plex /aggr0/plex1: online, normal, active RAID group /aggr0/plex1/rg0: normal 2. Launch aggregate mirror for each one. FilerB/FilerA> aggr mirror aggr0 –v aggr0(1) 3. Wait awhile for all aggregates to synchronize. FilerB/FilerA: raid.mirror.resync.done:notice]: /aggr0: resynchronization completed in 0:03.36 FilerB/FilerA> aggr mirror aggr0 -v aggr0(1) Aggr State Status Options aggr0 online raid_dp, aggr root, diskroot, nosnap=off, mirrored raidtype=raid_dp, raidsize=16, ignore_inconsistent=off, snapmirrored=off, resyncsnaptime=60, fs_size_fixed=off, snapshot_autodelete=on, lost_write_protect=on Volumes: vol0 Plex /aggr0/plex1: online, normal, active RAID group /aggr0/plex1/rg0: normal Plex /aggr0/plex3: online, normal, active RAID group /aggr0/plex3/rg0: normal 4. Once the synchronization is done, power on the filer. RLM FilerB> system power on RLM FilerB> system console Type Ctrl-D to exit. Boot Loader version 1.2.3 Copyright (C) 2000,2001,2002,2003 Broadcom Corporation. Portions Copyright (C) 2002-2006 NetApp Inc. NetApp Release 7.2.3: Sat Oct 20 17:27:02 PDT 2007 Copyright (c) 1992-2007 NetApp, Inc. Starting boot on Tue Feb 5 15:37:40 GMT 2008 Tue Feb 5 15:38:31 GMT [ses.giveback.wait:info]: Enclosure Services will be unavailable while waiting for giveback. Press Ctrl-C for Maintenance menu to release disks. Waiting for giveback Execute cf giveback on FilerB filer. FilerA(takeover)> cf status FilerA has taken over FilerB. FilerB is ready for giveback. FilerA(takeover)> cf giveback please make sure you have rejoined your aggr before giveback. Do you wish to continue [y/n] ?? y FilerA> cf status Tue Feb 5 16:41:00 CET [FilerA: monitor.globalStatus.ok:info]: The system's global status is normal. Cluster enabled, FilerB is up.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)