Today we had a discussion when one of my customer's VNX storage had performance hit and we have started digging down to find out the issue and finally we came to know it is the unique behavior of FLARE as
it has a condition called Forced Flushing. It occurs when the percent count of dirty cache pages crosses over the high watermark and reaches 100%. At that point, the cache starts forcefully flushing unsaved (dirty) data to disk, suspending all host IO. Forced flushing continues until the percent count of dirty pages recedes below the low watermark.
Forced
flushing affects the entire array and all workloads served by the
array. It significantly increases the host response time until the
number of cache dirty pages falls below the low watermark. The Storage
Processor gives priority to writing dirty data to disk, rather than
allocating new pages for incoming Host IO. The idea of high and low
watermark functionality was implemented as a mechanism to avoid forced
flushing. The lower the high watermark, the larger the reserved buffer
in the cache, and the smaller chance that forced flushing will occur.
So
why "the SP Dirty Page% occasionally can reach 95%"? Because there are
too many inbound IOs and the backend disks might have been overloaded
thus the cache doesn't have enough time to write them to the disks.
Please also find the Default thresholds for different parameters in EMC VNX
it has a condition called Forced Flushing. It occurs when the percent count of dirty cache pages crosses over the high watermark and reaches 100%. At that point, the cache starts forcefully flushing unsaved (dirty) data to disk, suspending all host IO. Forced flushing continues until the percent count of dirty pages recedes below the low watermark.
Please also find the Default thresholds for different parameters in EMC VNX
FC
|
160
iops
|
EFD
|
2500
iops
|
ATA
|
70 iops
|
SATAII
|
90 iops
|
SAS
|
160
iops
|
NL SAS
|
120
iops
|
Dirty
pages
|
95 %
|
Disk
resp. time
|
>15
ms (if total iops > 20)
|
Average
Seek Distance
|
10 % or
> 30 GB/s
|
Lun
response time
|
>22ms
(if total iops > 20)
|
BE
Bandwidth
|
>320MB/s
or 2160MB/s for VNX
|
No comments:
Post a Comment