17JAN/111
We got a notification from our filers that a partner path is misconfigured. We have a active/active cluster.
Start the following command on the filer that issued the error: lun stats -o -i 1 -c 1
You can change the numbers to more suitable levels if you like. The results are as follows;
You can change the numbers to more suitable levels if you like. The results are as follows;
filer> lun stats -o -i 1 -c 1
Read Write Other QFull Read Write Average Queue Partner Lun
Ops Ops Ops kB kB Latency Length Ops kB
0 2 0 0 0 8 0.50 2.00 0 0 /vol/Vol01/LunA
2 2 0 0 8 8 6.50 2.00 8 63 /vol/Vol02/LunB
7 2 0 0 28 8 19.00 1.03 0 0 /vol/Vol03/LunC
This command lists all volumes on the filer and how many ops are processed. One of the columns is called "Partner Ops". If this columns has any number higher than 0 it means that this Lun is receiving data over the wrong filer head and that all data is transferred over the Interlink cable (the one that syncs the nvram of the cache card in the cluster heads). You can resolve this by installing or updating the "host utilities" and/or "the MPIO and letting this tool configure the multi-paths to this storage.
You can read more about this in the following Netapp article: https://kb.netapp.com/support/index?page=content&id=3010111&pmv=print&impressions=false
The technical background behind this problem lies in the "FC Nodename" and the "FC Portname". In our cluster we have 4 paths from our server to the LUN. All paths will use the same WWN ID for the "FC Nodename" and a separate WWN ID for the "FC Portname".
You can check the FC Nodename and FC Portname of your filer with the following command: sysconfig -v 2The data will look like this (please note that I removed some lines that where not needed for this example).
filer> sysconfig -v 2 slot 2: Fibre Channel Target Host Adapter 2a (Dual-channel, QLogic 2432(2462) rev. 2, 64-bit, <ONLINE>) Firmware rev: 4.5.2 FC Nodename: bb:0a:09:80:87:29:7f:bb (bb0a098087297fbb) FC Portname: bb:0a:09:81:97:29:7f:bb (bb0a098197297fbb) Connection: PTP, Fabric slot 2: Fibre Channel Target Host Adapter 2b (Dual-channel, QLogic 2432(2462) rev. 2, 64-bit, <ONLINE>) Firmware rev: 4.5.2 FC Nodename: bb:0a:09:80:87:29:7f:bb(bb0a098087297fbb) FC Portname: bb:0a:09:82:97:29:7f:bb (bb0a098297297fbb) Connection: PTP, Fabric
Both "FC Nodename" WWN IDs have the same number (in blue) and the "FC Portname" only differentiate one number with each other (in red). In a Multi-path configuration the only way to differentiate a FC path to a Lun is by looking at the "FC Portname" number.
Since we have 4 paths to our Lun the server will see the same drive 4 times. The MPIO software (like the build in software from Windows 2008) will need to "merge" these disks to one and select one path as primary. Should this path fail the MPIO software will select a different path without downtime on the server. Since the default MPIO software can't see what path is the best path to use, it's possible that the MPIO software selects a path to the filer head that is not handling the Lun. When this happens all data written to this Lun will be send to the wrong (passive) filer head. This filer head will send all data received to the active head by using the Interlink cable that normally synchronises the nvram of the cache card. The active head will the write the data to the disk by normal procedure.
The Host utilities provided by Netapp (or their OEM supplier like IBM) has a method to select the best path, the path connected to the active filer head. So be sure to install and/or update these and, in case of a Linux/ESX host run the following command every time you add a new Lun to the system: /opt/ontap/santools/config_mpath --access <FilerA>:<user>:<password> --access <FilerB>:<user>:<password> --primary --loadbalance
Be sure to replace all the <> variables with their respective values.
Be sure to replace all the <> variables with their respective values.
No comments:
Post a Comment