Data preservation versus data archival
by Rick Bump
Which is better for your data lifecycle management?
As you build a data lifecycle management approach, you will ultimately need to determine when and how to store various types of organizational data. We know that not all data is created equal and that different types of data require different types of retrieval accessibility and therefore need different types of storage.
Two buzzwords within the industry are data preservation and data archival. While seemingly similar, they are actually different approaches, with each having its own benefits and drawbacks. So, what’s the difference between preservation and archival? And how do you know which one is best suited for your business?
Data preservation typically refers to the process of protecting data from being lost, damaged, or destroyed. It involves creating backups, storing data in secure locations, and ensuring that data can be retrieved and accessed when needed. Data preservation is typically done to ensure that data remains available for future use, such as for research, historical documentation, or legal purposes.
Data archiving, on the other hand, involves moving data that is no longer actively used to a separate storage location for long-term preservation. The data is typically combined, compressed, and stored in a way that, while it can be accessed, it’s not meant to be regularly accessed or updated. The purpose of data archiving is to free up space on primary storage systems, reduce costs, and ensure that important data is preserved for future use. In this way, data archiving can be a component of data preservation, which is more of an overarching strategy.
How to determine which approach is right for your business?
We believe that data preservation should be the backbone of an organization’s backup storage solution. The reality is that as technology has evolved, preservation solutions have come to market that are cost-effective and reliable options for long-term data storage.
In general, there are three levels of data storage solutions.
- Hot storage – data that is frequently used or accessed is stored in a hot storage solution, typically on a memory or SD drive. Hot storage is the fastest and most accessible type of storage, and therefore it is also the most expensive.
- Warm storage – for data that is not regularly used but needs to be accessed from time to time, a warm storage solution is recommended. Warm storage often uses network-attached storage (NAS) making it available via the network when needed, but offline and secure when not being used. It is slower than hot storage, which also means it is typically less expensive.
- Cold storage – data that needs to be preserved but rarely (or never) needs to be accessed can be stored in cold storage. The slowest and least expensive of the three, cold storage is stored offline in a secure mechanism such as tape. The data is compressed so needs to be decompressed before it is readable, making it time-consuming to access.
However, with recent breakthroughs in material science, we’re seeing a fourth storage type emerge. In lieu of a better name, let’s call it Cool-warm storage. Using optical discs to increase density storage per disc, cool-warm storage is less expensive and easier to use. It’s object-based vs TAR-based for cold storage and the files don’t need to be compressed. This makes it much more accessible and movable than true cold storage.
Typically, organizational data moves down this list – either manually or automatically with a tiered storage solution. When it’s first created and is being used regularly, it will live in hot storage. As it becomes older and less relevant, it moves into warm storage. Eventually, it likely only needs to be kept for data retention policies, so can be moved to cold storage, and maintained there until it is deleted.
5 questions to determine the right storage solution for your data preservation
To determine the best storage approach for each type of data within your business, start by asking yourself the following five questions.
- What are the data policy obligations? No matter your industry, you likely have data retention requirements and regulations that require certain types of data to be retained for a specified time period. In addition, your company likely has defined its own data retention policies. The reality is that you only really want to keep data as long as is necessary, as maintaining data longer than is required can also put your organization at risk. Understanding how long each type of data needs to be kept will help you determine the best storage solution at each phase throughout its lifecycle.
- How often does the data need to be accessed? Another key consideration for where to store your data is how often you want to be able to access it. While some data is used every day, other types of data are maintained solely for legal purposes and may never need to be accessed or altered. An important component no matter the storage type is to have a system in place to understand what data is stored where and how best to access it when needed. This is why companies need a metadata solution that categorizes data to provide an inventory of what is stored where.
- How much do you want to spend? Costs vary based on a couple of key factors. Storage that needs to be accessed regularly will need to be stored in an online, always-on format, making it the most expensive. For data that is being preserved in a warm or cold storage solution, there are different options of varying price points.
- Tape or disc storage is less stable and therefore needs to be kept in a specific environment. This means you’ll pay more for a climate-controlled location.
- Optical, on the other hand, can maintain integrity in a less rigid environment, making it a less expensive option.
- How long data needs to be preserved? Different storage types have different shelf lives. For example, tape needs to be rewritten every 7-10 years. Hard drives only maintain their efficacy for 3 years. Each time you need to rewrite your data requires significant time and resources from your IT team. By calculating how many times you’ll need to migrate your data based on the type of storage used multiplied by how long you need to preserve it, you can understand the implications of each type of data storage and choose one that is best for your needs.
- How much capacity is required? Different types of storage hold different capacities. For example, a Blu-ray holds a lot more data than a single CD. An optical drive can hold 250 discs worth of data. Knowing how much capacity you’ll need over time will help you understand which storage type will fit your business. It’s important to not just account for today, but also plan for the future so that you don’t immediately outgrow your current storage solutions.
While data archival certainly has its uses within a data lifecycle management approach, data preservation is a proactive approach that seeks to ensure the ongoing usability and accessibility of an organization’s data. By taking steps to maintain the integrity of data over time, organizations can avoid the costly and time-consuming process of restoring data that has been lost or corrupted due to storage failures, format obsolescence, or other factors. Learn more about how to build an effective data lifecycle management approach in our FAQ.