Barton Dozier, Senior Engineer
An Update on File-based Storage Systems
The file-based Storage solutions currently in the WIRD portfolio have distinct best-in-class features and have seen some notable improvements that are discussed below. In this article, I would like to explain and position each of WIRD’s file-based storage offerings, and highlight their distinguishing features.
During this confinement period, some storage industry commentators have taken a wistful look back at the origins of the NFS and commend its longevity; http://storagegaga.com/a-paean-to-nfs/ After 36 years of existence, NFS is, with SMB the de facto standard ways to share files within an organisation. Sure, they have been superseded by File Sync and Share technology for some use cases, but it seems obvious that these ubiquitous protocols have a few years ahead of them, not least because of the support from VMware.
First of course, is Netapp, the bellwether of file-based storage platforms. With the latest ONTAP release that provides more functions that truly add value, it is clear that this platform will maintain its leadership position for years to come. We call this offering a unified storage platform as it offers not only NFS and SMB, but also FC, iSCSI and NVMe/FC, and since version 9.8 it supports S3 as well. The platform is available in appliance form, Software Only for hypervisor or bare metal deployments, and in the Cloud. in my opinion Netapp has the most advanced integration of Cloud services, the result of a clear strategy for the benefit of customers to work tightly with popular Cloud offerings and not to oppose or obstruct the thriving adoption of this model.
ONTAP 9.8 offers improvements in FlexCache that make it a compelling way to share large volumes of files in a single namespace spread across different sites. FlexCache is a caching technology that creates sparse, writable replicas of volumes on the same or on a different cluster that remain consistent, coherent and current, and is most useful for hot volume performance balancing, file distribution and software-build environments, media sharing and other use cases.
Also included in that release are improvements in FabricPool, ONTAP’s storage tiering solution. It is now supported with SnapLock, includes a pre-retrieve function from the capacity to the performance tier, and allows for longer cooling periods before data are automatically tiered. There are also notable improvements in MetroCluster and in improved DR capabilities with SnapMirror.
Next is IBM, who has had an impressive run with Spectrum Scale, which started life as GPFS in 1998. Last year IBM introduced the ESS 5000, the third-generation appliance that integrates the Spectrum Scale into a high-density HA-unit housing a bunch of hard disk drives (HDD). Note that earlier in 2020 the 2U NVMe-based ESS3000 was introduced, also a third generation offering.
Although the IBM ESS units resembles the Netapp FAS appliances in that it is based on a dual controller unit, the architecture of this Software Defined Storage platform can be deployed in a variety of ways; it can be a single node in a tiered storage solution, a two node HA implementation, or even thousands of nodes in order to enjoy the benefits of massive parallelisation. It is often deployed in a Network Shared Disks (NSD) architecture, where all of the nodes in the cluster can access the disks by using a local or networked connection.
Storage professionals who consider the IBM ESS typically have a particular challenge to address; it could be a new AI workload, or the need to ingest vast amounts of data, or a need to conserve rack space. The 375TB/u density of the ESS 5000 is top notch, and the 55GB/s of read throughput over InfiniBand outshines all others comparable units. Some of our customers were looking to overcome a performance bottleneck in their existing NAS that impair metadata-intensive operations. Spectrum Scale provides scalable metadata management by allowing all nodes of the cluster accessing the file system to perform file metadata. Other customers have strict encryption requirements that the ESS can fulfil with file-level encryption. Each application node can have the same encryption key or a different key, allowing for secure scale-out processing or multi-tenancy on shared storage hardware. Larger customers with a global reach may need to provide high speed access to a global namespace that unifies file and object, integrates supports for a local cache, and the ESS could be a first step in that direction.
Weka.IO is a much newer player in file-based storage that has made a splash in the HPC area. The Weka File System, WekaFS, pulverised the SPEC SFS 2014 benchmark results in 2019 and nobody has since had the gumption to go up against them. WekaFS clean sheet design was conceived in the NVMe and Cloud era. As such the software is a scale-out solution optimized for NVMe flash storage with the option to tier into object storage for higher capacity namespaces. The performance tier leverages 100Gb or 200Gb networking between the storage cluster nodes for ultra-low latency with support for Ethernet and InfiniBand.
Weka is a great choice when applications need a combination of high bandwidth, IOPS and metadata performance with sub 250 micro-second latency for reads and/or writes. Put briefly, WekaFS offer ultra-high performance for all workloads without the need to tune the system to specific needs. Its performance scales linearly as more nodes are added to the storage cluster allowing the infrastructure to scale with the increasing demands of the business. Paradoxically, it is even faster than local storage. Alas, you will need to ditch your faithful NFS protocol to achieve the ultra-high-performance figures. NFS at 36 years old is a bit like Roger Federer at 39; a force to be reckoned but not as fast as younger talents.
Application servers that require ultra-high performance can use the WekaFS client API that presents a POSIX compliant parallel file system.. Data can also be accessed via NFS, SMB or S3, but those client interfaces are not able to deliver the industry-leading performance of the Weka client.
The WekaFS solution unifies a performance tier for hot data with the option to add an object storage capacity tier for colder data. A single global namespace is presented to the applications and data is automatically tiered to the object storage system for long-term retention either on-demand or based on policies. All file metadata remains on the performance tier so any file is easily retrieved at a later date if needed by an application.
Taking a contemporary approach to ease of management and deployment, the entire solution can be up and running in hours. Designed when the S3 protocol from AWS was already an established norm, the solution work with all major Cloud offerings and object storage platforms that support that protocol.
Genomics England have set a highly ambitious goal to sequence five million whole genomes by 2024, which translates into constituing a genome library in the 100s of petabytes. In use at GEL since 2018, Weka.IO has delivered a 10x performance improvement over the legacy flash-based NAS and is enabling more effective use of existing cloud infrastructure. GEL are convinced that only Weka.IO can deliver the performance and cost effectiveness to succeed.
Many new applications in research (genomics, bio imaging) in AI (Machine Learing, Analytics) and in finance will need this type of file storage. With GPU processing power and networking technology making such great strides in performance, it was only natural that a player step up to the challenge and design a parallel filesystem able to keep pace with the data volumes that can be processed. Weka.IO has done so brilliantly.
Barton Dozier, Sales Consultant, WIRD Representation Office Geneva, barton.dozier@wirdgroup.com
Comments