RAID
RAID (Redundant Arrays of Independent Disks)
RAID, or “Redundant Arrays of Independent Disks” is a technique which makes use of a combination of multiple disks instead of using a single disk for increased performance, data redundancy or both. The term was coined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.
video
Why data redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This means, in case of disk failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the operation. On the other hand, if the data is spread across just multiple disks without the RAID technique, the loss of a single disk can affect the entire data.
Key evaluation points for a RAID System
- Reliability: How many disk faults can the system tolerate?
- Availability: What fraction of the total session time is a system in uptime mode, i.e. how available is the system for actual use?
- Performance: How good is the response time? How high is the throughput (rate of processing work)? Note that performance contains a lot of parameters and not just the two.
- Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the user?
RAID is very transparent to the underlying system. This means, to the host system, it appears as a single big disk presenting itself as a linear array of blocks. This allows older technologies to be replaced by RAID without making too many changes in the existing code.
RAID Levels
Whether hardware or software, RAID is available in different schemes, or RAID levels. The most commonly levels are RAID 0, 1, 5, 6, and 10. RAID 0, 1, and 5 work on both HDD and SSD media. (RAID levels 4 and 6 also work on both media, but are rarely seen in practice.)
Raid 0: Striping
Requiring a minimum of two disks, RAID 0 splits files and stripes the data across two disks or more, treating the striped disks as a single partition. Because multiple hard drives are reading and writing parts of the same file at the same time, throughput is generally faster.
RAID 0 does not provide redundancy or fault tolerance. Since it treats multiple disks as a single partition, if even one drive fails, the striped file is unreadable. This is not an insurmountable problem in video streaming or computer gaming environments where performance matters the most, and the source file will still exist even if the stream fails. It is a problem in high availability environments.
RAID 1: Mirroring
RAID 1 requires a minimum of two disks to work, and provides data redundancy and failover. It reads and writes the exact same data to each disk. Should a mirrored disk fail, the file exists in its entirety on the functioning disk. Once IT replaces the failed desk, the RAID system will automatically mirror back to the replacement drive. RAID 1 also increases read performance.
It does take up more usable capacity on drives, but is an economical failover process on application servers.
Raid 5: Striping with Parity
This RAID level distributes striping and parity at a block level. Parity is raw binary data. The RAID system calculates its values to create a parity block, which the system uses to recover striped data from a failed drive. Most RAID systems with parity functions store parity blocks on the disks in the array. (Some RAID systems dedicate a disk to parity calculations, but these are rare.)
RAID 5 stores parity blocks on striped disks. Each stripe has its own dedicated parity block. RAID 5 can withstand the loss of one disk in the array.
RAID 5 combines the performance of RAID 0 with the redundancy of RAID 1, but takes up a lot of storage space to do it – about one third of usable capacity.
This level increases write performance since all drives in the array simultaneously serve write requests. However, overall disk performance can suffer from write amplification, since even minor changes to the stripes require multiple steps and recalculations.



Comments
Post a Comment