
What is Data Unification?
Data unification is the process of merging data from different data sources including your NAS, cloud buckets, deep archives, DAMs, and MAMs into a single searchable and analyzable repository.
What challenges does Data Unification solve?
Most organizations have collected hundreds of TBs if not PBs of data over multiple storage silos (NAS, cloud, archive) and sites. This data is managed without any central oversight thus resulting in poor data management practices across content discovery, search, backup, DR, archive, and migration.
Data unification is designed to address this growing challenge by creating an overarching, intelligent management framework (consisting of connectors, extractors, databases, and services). This will allow administrators and decision-makers to make better decisions about their data and workflows.
DNAfabric: The first data unification platform for media and unstructured data
While data unification has existed across IT systems, DNAfabric is the first data unification platform for unstructured data and media.
DNAfabric works by connecting to disparate data and metadata stores including NASes, cloud, object stores, archives, MAMs, and DAMs. It then extracts and builds an intelligent searchable and analyzable repository unlocking key insights across utilization, growth, cost, and duplication while giving you a global view of your data – no matter where it lives.
This allows decision-makers to improve processes – specifically across backup, archive, and even large migrations, plan resource usage and reduce costs.
DNAfabric: How the data unification process works
Step 1: Connecting to data stores
Connecting to multiple, disparate data sources is the first step to analyzing data and workflows. To perform this task, DNAfabric implements multiple data connectors depending on the data source. The type of connectors can be divided into the following groups.
- File-Systems: This represents any data store that can be accessed as a file-system. To enable access to these data sources, DNAfabric utilizes open file-system clients (NFS, CIFS or any posix compliant file-system client provided by the file-system vendor) to connect to the file-system data source.
- Object Stores: This represents any object based data store. To enable access to these data sources, DNAfabric utilizes S3 compatible or other object-based protocols including Blob, Google Cloud, etc.
- Remote Stores: Cloud-based storage platforms such as Dropbox, Google Drive, etc. are not typical object stores but are a popular destination for data storage. To enable access to these data stores, DNAfabric utilizes custom API connectors.
- UDP Stores: UDP data stores such as File Catalyst, Expedata are also supported. To enable access to UDP stores, DNAfabric utilizes custom UDP-based APIs.
- Deep Archive Stores: Deep archives represent a multitude of archival platforms including Archiware, Xendata, Spectra BlackPearl, Quantum StoreNext, and more. To enable access to deep archives, DNAfabric utilizes custom APIs.
- MAM/DAM: Increasing amounts of data and metadata are being hosted in MAM and DAM solutions both on-premise and in-cloud. To enable access to MAM/DAM platforms, DNAfabric utilizes custom APIs.
Step 2: Extracting metadata into an open database
Once connected, extractors sync metadata (both file level and application level when available via APIs) to a centralized, open JSON database. This database can be hosted on-premise or in-cloud. Additionally, this database can be 100% operated and owned by the customer enabling 100% open access to all extracted information.
Step 3: Search, discovery and analysis services
Once DNAfabric connectors merge metadata into a centralized database, multiple services can be enabled. This section focuses on the search and discovery services essential to making high level decisions across your data and storage silos.
Service | With Unification | Without Unification | |
---|---|---|---|
Browse and Search | A global browse and search pane allows for any asset to be located across any connected data silo. | Without a centralized interface, admins utilize multiple point tools to query and locate an asset resulting in poor re-purposisation workflows. | |
Cost Analysis | A cost analysis tool polls cloud cost APIs and builds accurate, predictive cost models that allow decision-makers to drive data storage cost saving decisions. | Without an accurate way to compute costs, admins have utilized spreadsheets with out-of-date costs and storage metrics. | |
Storage Planning | A storage analysis tool tracks multiple storage metrics per directory across all connected data silos. This allows decision-makers to take key decisions on how to archive, migrate, sync, and backup data. | Without storage tracking and planning tools, admins and decision-makers utilize spreadsheets to make storage utilization and resource planning decisions. This often results in less than efficient decisions made on information that is not up to date. | |
Media De-Duplication | The first of its kind, media de-duplication tool allows for media duplicates across shared NAS, cloud, object, and archives to be identified and cleaned up. Media de-dupe has the potential to drive massive cost savings across NAS, cloud, archives, and migrations. | Without media de-dupe, data silos continue to grow as end users create multiple copies across finished projects, re-ingested media, and restored copies. Duplicate data is a major factor for multiplying storage costs. | |
Ransomware Probing | Intelligent media scanning tool capable of detecting early traces of ransomware, hardware failures, and metadata corruption across shared NAS, HDDs, and other filesystems. Allows for malware to be detected early on preventing widespread failures. | Without probing tools designed to detect media errors, administrators do not detect ransomware attacks, storage failures, and other forms of corruption until it is too late. This can result in massive data loss and consequently financial losses. |
Step 4: Action services
While a data unification platform largely addresses the need of making data decisions easier, DNAfabric also provides multiple action tools to manage data. It is important to note that action decisions such as backup, archive, or migration do not necessarily need to be driven by DNAfabric but can be driven by a third-party tool.
The following are some of the key action services provided by DNAfabric.
- Backup and DR: DNAfabric enables backup, snapshot, and DR services across a multitude of endpoints including File-systems, Object, and LTO/LTFS backend. It allows organizations to build robust site-to-site, on-site, and site-to-cloud backup and disaster recovery workflows.
- Archive and Tiering: DNAfabric enabled archiving and tiering services across a multitude of endpoints including File-systems, Object, and LTO/LTFS backend. Organizations can implement comprehensive archiving and tiering workflows to reclaim primary storage and utilize either LTO/LTFS or object buckets for long-term archiving.
- Migration: DNAfabric assists with large-scale migrations as well. It can not only provide key analytics across existing archive and storage pools but also move and migrate data between data stores including File-systems, Object, and LTO/LTFS
- Syncing: DNAfabric has built-in UDP and sync tools to keep storage pools synchronized across sites and cloud. With sync tools, DNAfabric is able to keep collaborative groups across sites and cloud synced.
DNAfabric: Key Benefits and Conclusion
With the complexity of data management in today’s environments, a data unification platform is essential for making better data management decisions. A data unification platform offers the following high-level benefits:
- Ability to locate data across all storage silos thus reducing data copies and accelerating data re-purposing.
- Ability to analyze data and storage metrics thus enabling better decision-making across backup, DR, archive, and migration workflows.
- Ability to calculate and predict storage costs across on-premise and cloud to enable better cost and storage resource planning.
- Ability to analyze media and unstructured data duplicates across all storage silos thus enabling storage reduction, improved migration, and archive practices.
- Ability to centralize actions such as backup, archive, sync, and migration.
A data unification platform like DNAfabric presents the next step of data management enabling massive efficiencies across workflows.