What's the oldest file on your computer? A scan of my hard drive turned up a handful from the late 1990s as well as a 2GB trove of interview notes and call recordings from a book I wrote in 2006.

The chance that I'll ever use this information is next to zero, but I keep it around because, well, I can. The relentless decline in the cost of storage has made it cheaper to retain information than to throw it away.

Hoarding isn't such a good idea when it comes to data, however. If I were working for a corporation and some clients I interviewed in 2006 exercised their legal right to be forgotten, my company could be on the hook for my pack-rat behaviour.

"Human beings don't like to delete stuff," said Bill Tolson, vice president of global compliance and e-discovery, Archive360, a data migration and management company.

 

Organisational ROT

The result is that, by some estimates, as much as 80% of the information businesses and their employees have is outdated or useless. Information governance professionals have a term for this: ROT (redundant, obsolete, trivial).

As much as 80% of the information businesses and their employees have is outdated or useless.

There's a myth that companies that aren't subject to industry-specific regulations are immune from liability for keeping old data on hand, but nearly every organisation is regulated these days. Under the General Data Protection Act in Europe, similar legislation in the US, and privacy restrictions being enacted in more than 120 countries around the world, keeping data longer than it's needed is a risk to any organisation.

Regulation is just one of several reasons to clean out your hard drive. The most well-defended corporate databases can't protect against a malware attack on a home PC or information unintentionally left in the open on a cloud server. The more data a company collects, the bigger the attack surface.

The more data a company collects, the bigger the attack surface.

"Why spend money to protect data you don't need and why keep it someplace a hacker can take advantage of?" said Sue Trombley, managing director of thought leadership at data and records management giant Iron Mountain. Ransomware doesn't distinguish between good and bad data, and no one wants to pay to recover something that shouldn't have been there in the first place.

 

Costs can be deceptive

Then there's cost.

"Storage is cheap but the people to manage it aren't cheap," said Trombley. Data needs to be protected and backed up and the cost mounts with volume. And if the information is ever subject to a legal proceeding, costs can skyrocket.

In an oft-cited 2002 analysis of electronic discovery costs covering nine cases, DuPont reported that half of the more than 75 million pages of documents that were reviewed were past the company's required retention period, resulting in millions in unnecessary review fees. It’s safe to say the figure would be much higher today. 

Other costs are harder to estimate, such as the impact of poor business decisions based on outdated information, confusion caused by conflicting information or time spent sifting through useless data looking for something of value. "If the average employee spends two hours per week looking for information, what does that contribute to the overall cost?" asks Tolson. "What revenue could they have generated instead?"

Despite compelling arguments for throwing away unnecessary data, few organisations restrict the use of personal storage devices or cloud file shares. "They don't think about it," Tolson said. "It's at the bottom of the list of things they may address someday."

 

AI to the rescue?

Technology offers a partial solution. Data catalogue software automates the process of discovering and categorising data across an organisation. Most data catalog vendors also offer discovery features that can find data on corporate servers, individual PCs and cloud storage. Many even flag or automatically delete old records based on company policies.

A more lasting solution is to implement data governance standards that define how users should manage data responsibly, including the use of meta-tags, limits on making copies and record-retention schedules. Thanks to the wake-up call of privacy regulations, "large organisations have become savvy about records retention," said Trombley.

In the long term, Tolson believes technology will find a solution. "You have to change the company culture to actively manage old data and put policies in place to cull it when it's no longer needed," he said. "An artificial intelligence system should be able to do this transparently."

As long as it doesn't touch those old audio files on my PC.

 

This article was written by Paul Gillin from Computerworld and was legally licensed through the Industry Dive publisher network. Please direct all licensing questions to legal@industrydive.com.