Wednesday, May 18, 2011

Data availability tiers - recommendations for NetApp and IBM N series storage

The NetApp features description and usage recommendations for different data availability tiers.

Mission-Critical – high demand services, e.g. OLTP, batch transaction processing, virtualization/cloud environments.

Flash Cache
Use Flash Cache to improve system performance and minimize the impact to foreground I/O while in degraded mode situations.
SyncMirror
Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares
Use a maximum hot spares approach to make sure sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares to make sure the administrator will be notified when spare counts are reduced below recommendations.
Drive Type
Use performance drives (SAS, FC, or SSD) instead of capacity drives (SATA). Smaller-capacity 15k rpm or SSD drives result in shorter times for corrective actions. This is important when foreground I/O is prioritized over corrective I/O, which increases times for corrective actions. Performance drives help offset that performance delta.
Aggregate Fullness
Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures further degrade foreground I/O performance when drives are nearing full data capacity.
Utilization Monitoring
Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk to see greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization
Prioritize foreground I/O over corrective I/O by adjusting the RAID option raid.reconstruct.perf_impact to Low.
Scrubs
Use the default settings for RAID scrubs and media scrubs. Systems are assumed to be highly utilized, so increasing the duration of scrubs will likely provide a reduced benefit to data integrity while consuming additional system resources.
Maintenance Center
Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives to make sure the system returns to a normal operating state in a timely manner.


Business-Critical – to meet compliance requirements and/or intellectual property e.g. medical records, software source code, and e-mail.  

Flash Cache
Use Flash Cache to improve system performance and minimize the impact on foreground I/O while in degraded mode situations.
SyncMirror
Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares
Use a maximum hot spares approach to make sure sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares to make sure the administrator will be notified when spare counts are below recommendations.
Drive Type
Use performance drives (SAS, FC, or SSD) instead of capacity drives (SATA). Smaller-capacity 15k rpm or SSD drives result in shorter times for corrective actions. This is important when foreground I/O is prioritized over corrective I/O, which increases times for corrective actions. Performance drives help offset that performance delta.
Aggregate Fullness
Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring
Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk to see greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization
Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs
Consider increasing the frequency of RAID scrubs to increase integrity of data at rest.
Maintenance Center
Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.


Repository – used to store collaborative data or user data that is noncritical to business operations .

Flash Cache
Use Flash Cache to improve system performance and minimize the impact on foreground I/O while in degraded mode situations.
SyncMirror
Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares
Use a balanced hot spares approach to allow more disks to be used to add to the system capacity. Set the RAID option raid.min_spares_count to the recommended number of spares so that the administrator will be notified when spare counts are below recommendations.
Drive Type
Consider using SATA drives (backed by Flash Cache) for these types of configurations.
Aggregate Fullness
Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring
Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk for greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization
Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs
Consider increasing the frequency of RAID scrubs to increase the integrity of data at rest.
Maintenance Center
Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.


Archival – a large initial ingest of data (write), which then is seldom accessed. Priority is maintaining data integrity.

Spares
Use a maximum hot spares approach so that sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares so that the administrator is notified when spare counts are below recommendations.
Drive Type
Consider using SATA drives (backed by Flash Cache) for these types of configurations.
Aggregate Fullness
Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring
Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk for greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization
Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs
Consider increasing the RAID scrub duration (raid.scrub.duration) to help make sure of the integrity of data at rest. Consider increasing the media scrub rate (raid.media_scrub.rate) to increase drive-level block integrity.
Maintenance Center
Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.


Multipurpose – mixed environment.

Prioritize Recommendations
Prioritize configuration recommendations for the most sensitive tier of data availability when conflicting recommendations are present.
FlexShare®
Consider using FlexShare to prioritize system resources between data volumes.
Physical Segregation
Segregate the physical shelf and the drive layout for multiple data-availability tiers. For example, if you have both SAS and SATA (DS4243) attached to the same system, you could use the SAS drives to host mission-critical data while using the SATA drives to host archival data. Although you can mix DS4243 SAS shelves with DS4243 SATA shelves in the same stack, NetApp recommends separating the shelves into stacks so that physical failures affecting one tier of data availability will not directly affect both tiers of storage being hosted (in this example).


The full Technical Report from NetApp "Storage Best Practices and Resiliency Guide" can be found here

1 comment:

  1. > Set the RAID option raid.min_spares_count
    Correct option is raid.min_spare_count (w/o 's').

    ReplyDelete