Data Protection Series: The Upcoming Zettabyte Apocalypse

Topics: Data Backup and Archiving

Data Protection Series: The Upcoming Zettabyte Apocalypse

Data is growing at an incredible rate, and we’re running out places to store it. So how do you ensure your data protection strategy is future-proof? Listen to what experts John Woolley, Head of Technical Sales at Iron Mountain, and John Toigo, CEO and Managing Principal for Toigo Partners International and chairman of the Data Management Institute, provide their thoughts on the matter in our 4-part podcast series.

Part 1: What is the “zettabyte apocalypse”?


Part 1 - Transcript


Moderator: Hello and welcome to today’s podcast on best practises for managing the upcoming ‘Zettabyte Apocalypse.’

Joining us for today’s discussion are two leading data management experts, Jon Toigo and John Woolley.

First, I’d like to introduce Jon Toigo. He is CEO and Managing Principal for Toigo Partners International. Jon is also chairman of the Data Management Institute. He is the author of 15 books, including five on the subject of business continuity planning. Welcome Jon.

Jon Toigo: Thank you.

Moderator: Next, I’d like to introduce and welcome John Woolley, Head of Technical Sales at Iron Mountain. For the past 10 years, John has been an evangelist for data centre virtualisation and data management. In his current role, he defines and drives Iron Mountain’s Cloud Data Management solutions. Welcome John.

John Woolley: Yes, great. Thank you for having me.

Moderator: Our discussion today will focus around managing the upcoming ‘Zettabyte Apocalypse.’ So what is this upcoming ‘Zettabyte Apocalypse’ and how can IT organisations get prepared from a capacity and cost perspective? Jon T. – do you want to start here?

Jon T.: Sure, actually ‘Zettabyte Apocalypse’ is probably something I coined in an article along the way. If you read the analysts, the leading ones: IDC, Gartner, etc., they’re all expecting a big gigantic boom in data growth. In terms of zettabytes, we’re looking at 30-40 zettabytes of new data, depending on which analyst you believe, by the 2020 time frame. And, quite frankly, there isn’t enough capacity in all the cache that’s being produced and all the disk that’s being produced and all the optical that’s being produced in order to shoulder the burden of handling or storing all of this data. Big issue for the Cloud guys who are trying to create big infrastructure offerings for storage, but also big issue for larger—and even medium—sized organisations going forward because, of course, all that storage capacity has a price tag associated with it. Virtually, the only way we’re going to be able to handle the ‘Zettabyte Apocalypse’ is to bring all storage modalities to bear and to practise archiving, and to archive to tape. Tape is basically the godsend in this scenario because of the huge capacity increases that are anticipated over the next couple of years. John?

John W.: Yeah, I agree with you. There are a lot of considerations, and tape, in my view for a long time, whilst it’s fallen out of fashion somewhat for its traditional use…people are being forced to reconsider its position. We’re looking at the moment at what we’re calling the decriminalisation of tape. It’s been given a bad press by disk-based data protection specialists because their agenda is to sell more disks.

Jon T.: The good news there is that, for a lot of younger IQ practitioners, it seems like they don’t know tape at all. They have no pre-existing attitude toward it, positive or negative because they’ve never used the technology, and they’ve never even heard of it in some cases. And I think, that’s hopefully in the corner for tape going forward.

John W: Absolutely, they’re my best case for common sense. You’re right. We talk to the disk vendors, and they’re in a quandary about how they pack more and more data onto spindles. You know, you’ve got the new LTO formats, you’ve got the enterprise tape formats that are cramming more and more data onto the cassette, and we’re able to keep it for longer periods of time, so in an archive play, this is fantastic. It’s going to be a big question, and I think this is where a lot of the debate will be coming. When I talk to IT professionals and they say ‘I no longer want tape in my environment,’ what they forget is the verb to manage, and I think that’s the core piece. If we can get to find a way to manage the data on their tapes in a much more grown-up way, correct way, so that’s it’s much more accessible, I think that takes away the stigmatism around tape and allow us to overcome these challenges. The other big question that comes with this wave of data is, should we be keeping it all. That’s my question: why do we keep every piece of data that an organisation creates? Not all of it is relevant.

John T.: Well that is a very good question. I think that’s the one that everyone’s going to be struggling with over the next 5-6 years. If there is no indication on the data itself of its importance…there are certain regulatory mandates and certain cases and certain industries that carry with them a requirement to hold on to a certain kind of data, but very few companies in my experience are actually going through the heavy lift to sort out and classify the data they are creating to determine what is critical and what is less important and what is discardable. That’s sort of the Herculean effort of the era is to determine what belongs where.

Moderator: That wraps up today’s podcast. Thanks to both Jon Toigo and John Woolley for joining us today and for sharing their expert insights. Visit the Iron Mountain UK or NA websites for more insights and thought leadership around data management and data protection. Have a great day.



Part 2: Treating data as a risk and an asset


Part 2 - Transcript


Moderator: Data comes in different forms and from different sources. What are best practises to treat your data as both a risk and an asset? – John W. – what are your thoughts on this?

John W.: The best practise is realistically trying to figure out what you have, figuring out what different departments need and look at it from the regulatory perspective. Your accounting is the first place, contracts will be second, human resources third. These all have very clearly defined retention policies in different countries. The key here is to actually understand the retention policy but then put something in that’s automated and will delete. Because without that, it’s not defencible. Your massive petabytes of pools… how much of it comes from the unstructured stuff the masses produce—whether it’s PowerPoint, whether it’s video, whether it’s audio, and how long that has a lifespan for. And that’s where, I think, a lot of these guys can get to grips. There’s always that argument, right? It’s like hoarding boxes at home, where you don’t want to bin that particular item or delete, or throw away those particular letters just in case…

Jon T.: Because you know you need it the very next day, but that’s why storage…I think it’s universal from sea to shining sea here, we all have a junk drawer in our kitchen, and our storage is becoming a junk drawer. It stores everything in a nondescript, undefined way, and then we end up with a lot of storage space being consumed by junk. We really need to more disciplined here, and frankly the industry has pedaled a lot of Band-Aid fixes that don’t really fix a problem, they just forestall it a little bit. Things like de-duplication, I have no axe to grind with de-duplication, but I have a client in the United States that’s a financial services firm that can’t use de-duplication because the Security and Exchange Commission requires them to, by all reports, they are a publically-traded company, and it says that it should be a full and unaltered copy of data that’s provided to the SEC. And it’s never been established in court whether deduplication materially alters data or not. So, while the IT guy has the incentive to crush the data down as much as possible, using whatever available technology (compression or deduplication), the sad truth is that there are still lawsuits waiting in the wings to test the voracity of those approaches, and the business managers are often at odds. So sometimes it’s not as clear cut, even though the regulation seems pretty clear about what you need to do with your data and what you need to retain. It’s not exactly up to date with what the technology is.

John W.: And the other final part around best practise is that if we don’t know what you have, or more importantly what you don’t have, both can burn you in litigation just as badly. Any organisation that is selling across these markets, whether it’s US- or UK-based, European… this policy change between the two countries is always going to be difficult to manage. I think if you were to talk to a legal person, the default is to run with the most stringent. So that’s going to put a lot of challenges on U.S.-based companies operating in Europe.

Jon T.: I think there’s also going to be the issue of people not reading their warranty or license agreements, where they are sometimes trading away all of the protections that the law is affording them by simply signing a warranty agreement that has no bearing, or that negates that vendor’s responsibility to adhere to that law. That’s another knotty area here, because, let’s face it, most users are going to read the little disclaimer that comes down with their app, they’re just going to install the app and start using it. Now, the question is how much is the law going to require the vendor to watch out for the consumer versus having the consumer watch out for his own interests. I think that’s another big ground of contention here and how these regulatory mandates are implemented. Probably the most stringent one we have in the United States is HIPPA, is healthcare information. That has to be encrypted. It has to be retained for a certain number of years. There are hard and fast rules set on it, and medical institutions know that they need to design their systems with that in mind. I don’t know that I’m getting such a feel for clear-cut guidance yet in the other areas of personal privacy.

Moderator: That wraps up today’s podcast. Thanks to both Jon Toigo and John Woolley for joining us today and for sharing their expert insights. Visit the Iron Mountain UK or NA websites for more insights and thought leadership around data management and data protection. Have a great day.



Part 3: Getting your data house in order


Part 3 - Transcript


Moderator: The first step in managing this data growth is to understand your data, create policies and make your data defensible. Jon T. – any advice for companies getting their data houses in order?

Jon T.: Sure, I think that it’s becoming increasingly apparent to companies, whether they’re looking at tape or a cloud service (which some think is a form of storage)—usually an effective cloud media service is going to have tape on the back end in any case, at least in the case of 2 of the 3 major providers of cloud storage, they are using tape extensively or plan to—what you have to do first is understand your business processes and then understand the applications that support them, and understand the data that supports those applications. Unfortunately, those companies…the last time that kind of association was done, the last time that someone mapped the infrastructure and the data back to the business process was back when the application was first created and rolled out. That may have been 10 years ago. It’s time to revisit the data that you’ve got sitting on the shelf and figure out what’s associated with each business process, because the business process determines what data is important. If the business process is critical, the data and the applications that support that process are critical. So you have to do this criticality analysis. Now, that’s something you do for disaster recovery, something you do for security planning, something you do for compliance planning—it serves many masters. And it’s an enterprise well worth undertaking. It’s going to require IT people who can understand the infrastructure and business people who use the applications on a daily basis. And they’re going to have to find a way to sit together in the same room and figure things out. It’s not enough anymore to data attributes like last access and last modified and migrate data simply based on those factors. That is kind of a one-size-fits-all approach, which if you’re slightly on the rotund size, you know that anything that’s one-size-fits-all doesn’t fit you very well.

John W.: In terms of making it defensible and compliant, sitting where I sit today and looking at our traditional business (paper, records management) is that record managers have it really locked down in the large corporates. They understand this. It has just failed to roll into the IT groups. Some of the key principles that are sitting with me that make common sense are things like, first of all: make it easy to find. So if it’s referencable and you can see where it is, you can make some decisions on keeping it, or even around your policies. The second thing is, and making it compliant when it goes into any medium you want, and typically we talk about tape being the best for this for cost reasons, make it read-only so you can prove that it’s not been tampered with and it is in its original format, and if anything does change with a document, it becomes a new one. So you need to be able to track the changes. The next real thought we had was to automate the policies because nothing is more defensible than a systematic policy so as long as human error cannot creep into it, it stands up further in a court of law. Then the final part is about compliance and defence. If you are defending in a case nothing is as defensible as a neutral third party doing those searches for you. So again if you make it easy to find and have it in a format that is a lot simpler, it keeps those costs down and allows you to turn around and say to a court we have made our best efforts to find, it has not been us but a third party and therefore transparent and open.

So those are just some of the guidelines we are looking at.

John T: I think it helps if you can locate a consultant or a vendor who’s a trusted partner who can assist you in those efforts as well, sometimes you need an honest broker between the different side of the house in the business, unfortunately sometimes those IT folks and the business folks don’t speak the same language and it helps to have a negotiator between them to come up with those policies to understand and translate what the legal / regulatory requirements are into actual policies for data retention and data protection. I agree with John Woolley 100% on that.

John W: ‘Do you know what John, the most scary part about this is the law will often see that if you don’t have a retention policy; it’s forever by default.

Moderator: That wraps up today’s podcast. Thanks to both Jon Toigo and John Woolley for joining us today and for sharing their expert insights. Visit the Iron Mountain UK or NA websites for more insights and thought leadership around data management and data protection. Have a great day.



Part 4: The role of tape in managing data growth


Part 4 - Transcript


Moderator: Why must tape play a role in dealing with the ‘Zettabyte Apocalypse?’ John W. – what are your thoughts here?

John W.: “Well, tape is critical. John said earlier the vendors themselves cannot create enough hard disk drives, flash drives or removable media in terms of optical to cope with this growth, this storage requirement. That’s first and foremost. When you start to peer into an enterprise and you realise that at very least about 75% of data in any given house is a copy you start to see the wasteful nature (of storage).

So tape is critical for a number of reasons.

One. It has a much lower cost of ownership to procure to start with. When a tape is not being used, unlike a disk drive, it does not have to be powered, it does not have to be cooled and it lasts significantly longer if you treat it in the right way, if you store it in the right environment, so for that reason alone is a starting point.

In terms of capacities, tape is out performing disk so we can store more per cassette, each tape cassette to procure in terms of cost per TB, so therefore, if you want if more copies it’s better. When I talk to IT professionals one copy is dangerous, two copies getting better, three is ok, four is becoming prohibitively expensive but with tape for the same price as a disk drive, you could have 5, 10 more copies of data of which if you lose one or two, it’s no longer a disaster.

So when you start taking the running cost, the cost of ownership and the fact that you can have more copies of it, it will become critical, it will be the only way to keep archive data for the long term or none critical (and I term this in the best means for business continuity / disaster recovery, the stuff that you don’t necessarily need online within the first couple of hours of a disaster happening. You can buy yourself some time to recover that from tape. These are the key plays where we see this fitting in. It’s all about getting the right tool for the right job.

Jon T.: John, I would add in that a lot of people dismiss tape as an obsolete technology because it isn’t performing in terms of competitive random access rates with disk. It has a low bit error rate so that means that one in every ninety SATA drives has a none recoverable bit error on it, that will spoil your whole day, especially if it’s in a RAID set. It will corrupt all the data that’s there. Your option of course is tape, it is many orders magnitude more resilient than disk, has a much lower price point per GB and quite frankly it’s an extraordinarily resilient, robust technology for doing everything you’re describing. The last point I would make about this is that tape alone is not going to save everybody you also need the automation and archival practise in place where you’re going to call it HSM, tiered storage or whatever and about 60% of your data ought to ultimately reside on tape only because it’s infrequently accessed and very infrequently modified so you don’t need to have it on your spinning rust at all.

We did a study last year at the Data Management Institute of 3000 firms and discovered that 80% of the space of every disk drive you own on average is occupied by data that shouldn’t be there, whether it’s contraband, junk or duplicates or it’s data that belongs in an archive.

If you did that just imagine how you could pay for this whole scenario, tell management that you’re going to buy back 70% of the infrastructure they already own and usually that’s more than enough coin to be able for a decent archive strategy for your company.

Moderator: That wraps up today’s podcast. Thanks to both Jon Toigo and John Woolley for joining us today and for sharing their expert insights. Visit the Iron Mountain UK or NA website for more insights and thought leadership around data management and data protection. Have a great day.