The evolution of datacenter storage has come full circle over the years. The traditional Direct Attached Storage (DAS), which was relatively simple, only ran what was required internally, and emerged into cumbersome, large and expensive SAN/NAS systems. In recent years, we have witnessed a shift back to a more efficient and effective system due to various advancements. By dissecting this progression, we can fully understand the transformation that datacenter storage has gone through to get to its current day capability.
DAS, NAS and Exponential Data Growth
When examining the history of datacenter storage, we have to start where it all began: with DAS. Every specific application server had its own disks attached to the same box in order to provide dedicated storage. The DB servers had their own disks with their own protection and redundancy, the security servers had to deal with their own local HDD, and each of the components were separate.
The next phase in the evolution was a Storage Attached Network (SAN), which included a storage appliance based on an aggregation of disks. The majority of enterprise workload databases such as Oracle, SQL, Exchange, DB2 and files were hosted entirely on the external disk array. The controllers mainly supported protocol such as SATA, SAS, or fiber channel (FC). The disk array could provide capacity, protection and replication to the entire organization. The IT manager could set platinum, gold and bronze policies support Tiers and could accordingly enforce central security measures like firewalls, antivirus and system audits.
The amount of data that an organization produces and consumes can grow exponentially, causing storage, deployment and management to be challenging. Specific regulations now require that businesses track, store and analyze more information about their users. Workloads such as big data analytics, with a huge amount of raw data, are deployed to evaluate data for business intelligence. Multiple copies of this data then needs to be saved for high availability.
Overall, live application data is only accountable for around 1/4 of your total data because the rest is occupied by snapshots, DR, raid groups and hot spare (which brings you back to production if there is a crisis).
In the recent past, disks mainly used HDD technology, not the non-volatile RAM that we recognize from today’s devices such as cameras and phones. In order to achieve optimal performance and higher response times, however, more and more spindles (disks) needed to be added. The drawback of this method was that most regular operating systems were unable to manage that many disks. Regulating two or three disks associated with a PC is feasible, but facing 400 disks is a lot more complicated, not to mention storage management tasks such as striping data, calculating RAID parity, managing hot spares, data scrubbing and replicating data across several sites for HA and Backup, sync or async.
As a result of these storage challenges, SAN appliances became cumbersome giants that consumed huge amounts of physical resources such as electricity and space. They also required significant financial investments due to the need for them to be connected to fabric, routers and switches. Additionally, all resources involved had to be redundant. The ratio of live storage and actual capacity became 1:4. Third parties were introduced for software management and dedicated IT was required to manage the environment.
The Next Phase: Virtualized Storage
In order to facilitate management and operations, the next step was storage virtualization: a new layer that separated the physical block device and the logical storage volume. This enabled various features like: live volume migration across different pools, Data mobility between fast and slow disks and elasticity of the storage. In addition, it allows for smart caching, sync and async replication, application aware snapshots, and more. Virtualization made it much easier to migrate data between physical appliances and decrease a project timeline from months to weeks. Consequently, the hassle of purchasing physical devices was reduced, allowing the underlying physical storage resources to be optimally and efficiently utilized.
However, the fact that virtualized storage is based on central appliances that serve the whole organization limits scalability (or makes it expensive), and increases SAN fabrics and the complexity of overall storage operations.
Great Technological Advancements
Over the last years, there has been a huge change in modern application architecture. Now, most modern workloads can live in a server with no need for external storage. New technologies, such as Hadoop, Cassandra and other distributed methods, simplify the task of managing a cluster with nodes. For example, complex data analytics workloads that require large amounts of CPU resources can now be distributed across multiple nodes. Additionally, hyper-converged systems have also introduced new distributed storage technology that involves simple volumes and SSDs.
SSDs played a main role in the evolution of data center storage since the same workload performance could be achieved with a single disk. SSDs also feature low electricity consumption with a small carbon footprint.
A major change in the market that enabled the new approach came from the network. The fact that 10Gb network performance became popular and less expensive enabled a rapid stream of data chunks across the cluster. With new hyperconvergence technologies, multiple channels of data were able to move in parallel across several nodes (i.e., distributed storage) and shift all over the cluster. This allows for the development of methods and algorithms that support enhanced performance and resilience.
Hyper-Convergence and Distributed Storage
One of the most important changes to data center storage was to, first of all, eliminate RAID controllers, the dedicated servers that manage the multiple spindles in external storage. In hyper-converged infrastructure, the compute, storage and network subsystems are consolidated into the same box. By attaching an SSD volume to a server, we can expect that the data center’s operating system software is smart and fast enough to share data and capacity with the server’s peers in the cluster.
The network can be relied on to move chunks of data, that were traditionally sent to the external storage appliance, back and forth. It can move this information synchronously between several nodes, save several copies and at the same time, implement deduplication and compression to areas that are candidates to do so. Storage snapshots and replications are enabled within the server itself, without the need for third party involvement or dedicated gateway server.
Throughout the evolution of data center storage, customers and IT managers have become very familiar with cloud technology. For example, if you ask them what the value of cloud storage is, they will probably tell you that the cloud provides you with the space you need for your storage and compute capacity in an elastic, scalable, and on-demand manner. You won’t hear them refer to backend disk vendors because they are simply not relevant in the world of the cloud. A new language has been introduced and the audience is ready to move on and welcome the cloud state of mind. In the public cloud as well as the private cloud, users are looking for smart software that manages their pool of resources with ease.
The evolution of data center storage began with attached storage featuring a single server, and emerged to the point where everything was consolidated into specific silos. Now we are witnessing a return to previous methods, although this time everything is more natural and efficient due to all of the advancements.
via Technology & Innovation Articles on Business 2 Community http://ift.tt/2h5zGCD