Friday, February 8, 2013

Netapp Metro Cluster DR Test



Metro Cluster Disaster Recovery Test

Description
To test Disaster Recovery, you must restrict access to the disaster site node to prevent the node from resuming service.  If you do not, you risk the possibility of data corruption.
Failover Procedure
1.      Stop ISL connections between sites.
Switch-Site-01and Switch-Site-02 are located in disaster site, where filer FilerA is located.

Network to login to the above switches and need to disable the ISL ports.

Note: Once ISL is stopped then automatic failover capability is disabled. The filer FilerB cannot takeover FilerA filer.

2.      Once the ISL is stopped then issue the below command on filer FilerB and see the disk of FilerA in failed state. The same can be executed on FilerA filer as well.

FilerB> aggr status -r
Aggregate aggr0 (online, raid_dp, mirror degraded) (block checksums)
Plex /aggr0/plex0 (online, normal, active, pool0)
RAID group /aggr0/plex0/rg0 (normal)

RAID Disk Device     HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)   Phys (MB/blks)
---------------------------------------------------------------------------------------
dparity SITEA03:5.16 0b  1     0   FC:B  0  FCAL  10000 272000/557056000 280104/573653840
parity  SITEA02:5.32 0c  2     0   FC:A  0  FCAL  10000 272000/557056000 280104/573653840
data    SITEA03:6.16 0d  1     0   FC:B  0  FCAL  10000 272000/557056000 280104/573653840

Plex /aggr0/plex1 (offline, failed, inactive, pool1)
RAID group /aggr0/plex1/rg0 (partial)

RAID Disk Device HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks) Phys (MB/blks)
---------------------------------------------------------------------------------
dparity   FAILED                     N/A            272000/557056000
parity    FAILED                     N/A            272000/557056000
data      FAILED                     N/A            272000/557056000
Raid group is missing 3 disks.
FilerB>

  1. Connect on the Remote LAN Management (RLM) console on site B.  Stop and power off the NetApp controller.

    FilerA> halt
    Boot Loader version 1.2.3
    Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.
    Portions Copyright (C) 2002-2006 NetApp Inc.


    CPU Type: Dual Core AMD Opteron(tm) Processor 265
    OK>

  2. RLM- FilerA> system power off

  3. Login to FilerB filer and execute force takeover.

    FilerB> cf forcetakeover –d

  4. FilerB (takeover)> aggr status -v

----
FilerB> (takeover)>

FilerB> (takeover)> aggr status -v
Aggr State     Status           Options
aggr0 online   raid_dp, aggr    root, diskroot, nosnap=off,
               mirror degraded  raidtype=raid_dp, raidsize=16,
                                ignore_inconsistent=off,
                                snapmirrored=off,
                                resyncsnaptime=60,
                                fs_size_fixed=off,
                                snapshot_autodelete=on,
                                lost_write_protect=on
                Volumes: vol0

                Plex /aggr0/plex0: online, normal, active
                    RAID group /aggr0/plex0/rg0: normal

                Plex /aggr0/plex1: offline, failed, inactive

FilerB/ FilerA> aggr status -v
Aggr State      Status            Options
aggr0 online    raid_dp, aggr     root, diskroot, nosnap=off,
                                  raidtype=raid_dp, raidsize=16,
                                  ignore_inconsistent=off,
                                  snapmirrored=off,
                                  resyncsnaptime=60,
                                  fs_size_fixed=off,
                                  snapshot_autodelete=on,
                                  lost_write_protect=on
                Volumes: vol0

                Plex /aggr0/plex1: online, normal, active
                    RAID group /aggr0/plex1/rg0: normal
Failback Procedure
1.      After testing enable the ISL ports. Login to Switch-Site-01and Switch-Site-02 switches and enable the ports

The plexes which are in FilerA will be out dated state, this needs to be synchronized manually.


FilerB/FilerA> aggr status -v
      Aggr State      Status            Options
  aggr0(1) failed     raid_dp, aggr     diskroot, raidtype=raid_dp,
                      out-of-date       raidsize=16, resyncsnaptime=60,
                                        lost_write_protect=off
           Volumes:
                            Plex /aggr0(1)/plex0: offline, normal, out-of-date
           RAID group /aggr0(1)/plex0/rg0: normal
               Plex /aggr0(1)/plex1: offline, failed, out-of-date

          aggr0 online    raid_dp, aggr     root, diskroot, nosnap=off,
                                            raidtype=raid_dp,
          raidsize=16,
                                            ignore_inconsistent=off,
                                            snapmirrored=off,
                                            resyncsnaptime=60,
                                            fs_size_fixed=off,
                                            snapshot_autodelete=on,
                                            lost_write_protect=on
                Volumes: vol0

                Plex /aggr0/plex1: online, normal, active
                    RAID group /aggr0/plex1/rg0: normal 
2.      Launch aggregate mirror for each one. 

FilerB/FilerA> aggr mirror aggr0 –v aggr0(1)

3.      Wait awhile for all aggregates to synchronize.

FilerB/FilerA: raid.mirror.resync.done:notice]: /aggr0: resynchronization completed in 0:03.36

FilerB/FilerA> aggr mirror aggr0 -v aggr0(1)
    Aggr State     Status           Options
    aggr0 online   raid_dp, aggr    root, diskroot, nosnap=off,
                   mirrored         raidtype=raid_dp, raidsize=16,
                                    ignore_inconsistent=off,
                                    snapmirrored=off,
                                    resyncsnaptime=60,
                                    fs_size_fixed=off,
                                    snapshot_autodelete=on,
                                    lost_write_protect=on
         Volumes: vol0

         Plex /aggr0/plex1: online, normal, active
         RAID group /aggr0/plex1/rg0: normal

         Plex /aggr0/plex3: online, normal, active
         RAID group /aggr0/plex3/rg0: normal

4.      Once the synchronization is done, power on the filer.

RLM FilerB> system power on
RLM FilerB> system console
Type Ctrl-D to exit.


Boot Loader version 1.2.3
Copyright (C) 2000,2001,2002,2003 Broadcom Corporation.
Portions Copyright (C) 2002-2006 NetApp Inc.


NetApp Release 7.2.3: Sat Oct 20 17:27:02 PDT 2007
Copyright (c) 1992-2007 NetApp, Inc.
Starting boot on Tue Feb  5 15:37:40 GMT 2008
Tue Feb  5 15:38:31 GMT [ses.giveback.wait:info]: Enclosure Services will be unavailable while waiting for giveback.
Press Ctrl-C for Maintenance menu to release disks.
Waiting for giveback

  1. Execute cf giveback on FilerB filer.

    FilerA(takeover)> cf status
    FilerA has taken over FilerB.
    FilerB is ready for giveback.


    FilerA(takeover)> cf giveback
    please make sure you have rejoined your aggr before giveback.
    Do you wish to continue [y/n] ?? y


    FilerA> cf status
    Tue Feb  5 16:41:00 CET [FilerA: monitor.globalStatus.ok:info]: The system's global status is normal.
    Cluster enabled, FilerB is up.


No comments:

Post a Comment