Data availability tiers - recommendations for NetApp and IBM N series storage

The NetApp features description and usage recommendations for different data availability tiers.

Mission-Critical – high demand services, e.g. OLTP, batch transaction processing, virtualization/cloud environments.

Flash Cache	Use Flash Cache to improve system performance and minimize the impact to foreground I/O while in degraded mode situations.
SyncMirror	Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares	Use a maximum hot spares approach to make sure sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares to make sure the administrator will be notified when spare counts are reduced below recommendations.
Drive Type	Use performance drives (SAS, FC, or SSD) instead of capacity drives (SATA). Smaller-capacity 15k rpm or SSD drives result in shorter times for corrective actions. This is important when foreground I/O is prioritized over corrective I/O, which increases times for corrective actions. Performance drives help offset that performance delta.
Aggregate Fullness	Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures further degrade foreground I/O performance when drives are nearing full data capacity.
Utilization Monitoring	Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk to see greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization	Prioritize foreground I/O over corrective I/O by adjusting the RAID option raid.reconstruct.perf_impact to Low.
Scrubs	Use the default settings for RAID scrubs and media scrubs. Systems are assumed to be highly utilized, so increasing the duration of scrubs will likely provide a reduced benefit to data integrity while consuming additional system resources.
Maintenance Center	Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives to make sure the system returns to a normal operating state in a timely manner.

Business-Critical – to meet compliance requirements and/or intellectual property e.g. medical records, software source code, and e-mail.

Flash Cache	Use Flash Cache to improve system performance and minimize the impact on foreground I/O while in degraded mode situations.
SyncMirror	Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares	Use a maximum hot spares approach to make sure sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares to make sure the administrator will be notified when spare counts are below recommendations.
Drive Type	Use performance drives (SAS, FC, or SSD) instead of capacity drives (SATA). Smaller-capacity 15k rpm or SSD drives result in shorter times for corrective actions. This is important when foreground I/O is prioritized over corrective I/O, which increases times for corrective actions. Performance drives help offset that performance delta.
Aggregate Fullness	Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring	Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk to see greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization	Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs	Consider increasing the frequency of RAID scrubs to increase integrity of data at rest.
Maintenance Center	Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.

Repository – used to store collaborative data or user data that is noncritical to business operations .

Flash Cache	Use Flash Cache to improve system performance and minimize the impact on foreground I/O while in degraded mode situations.
SyncMirror	Use local SyncMirror to make sure of shelf-level resiliency and to improve performance in degraded mode situations.
Spares	Use a balanced hot spares approach to allow more disks to be used to add to the system capacity. Set the RAID option raid.min_spares_count to the recommended number of spares so that the administrator will be notified when spare counts are below recommendations.
Drive Type	Consider using SATA drives (backed by Flash Cache) for these types of configurations.
Aggregate Fullness	Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring	Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk for greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization	Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs	Consider increasing the frequency of RAID scrubs to increase the integrity of data at rest.
Maintenance Center	Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.

Archival – a large initial ingest of data (write), which then is seldom accessed. Priority is maintaining data integrity.

Spares	Use a maximum hot spares approach so that sufficient disks are available for corrective actions. Set the RAID option raid.min_spares_count to the recommended number of spares so that the administrator is notified when spare counts are below recommendations.
Drive Type	Consider using SATA drives (backed by Flash Cache) for these types of configurations.
Aggregate Fullness	Monitor aggregate “fullness” as performance degrades as disks get full (the drive heads need to travel farther to complete I/Os). Drive failures will further degrade foreground I/O performance when drives near full data capacity.
Utilization Monitoring	Monitor CPU utilization, disk utilization, and loop/stack bandwidth. If your utilization is greater than 50%, you are at increased risk for greater foreground I/O degradation in degraded mode situations. This can also increase the time it takes for corrective actions to complete.
I/O Prioritization	Use the default setting of Medium for the RAID option raid.reconstruct.perf_impact to balance foreground I/O and corrective I/O.
Scrubs	Consider increasing the RAID scrub duration (raid.scrub.duration) to help make sure of the integrity of data at rest. Consider increasing the media scrub rate (raid.media_scrub.rate) to increase drive-level block integrity.
Maintenance Center	Maintenance Center is recommended to enable intelligent triage of suspect drives in the field. This also facilitates the RMA process for failed drives so that systems return to a normal operating state in a timely manner.

Multipurpose – mixed environment.

Prioritize Recommendations	Prioritize configuration recommendations for the most sensitive tier of data availability when conflicting recommendations are present.
FlexShare®	Consider using FlexShare to prioritize system resources between data volumes.
Physical Segregation	Segregate the physical shelf and the drive layout for multiple data-availability tiers. For example, if you have both SAS and SATA (DS4243) attached to the same system, you could use the SAS drives to host mission-critical data while using the SATA drives to host archival data. Although you can mix DS4243 SAS shelves with DS4243 SATA shelves in the same stack, NetApp recommends separating the shelves into stacks so that physical failures affecting one tier of data availability will not directly affect both tiers of storage being hosted (in this example).

The full Technical Report from NetApp "Storage Best Practices and Resiliency Guide" can be found here

IT, storage & something else

Wednesday, May 18, 2011

Data availability tiers - recommendations for NetApp and IBM N series storage

1 comment: