It’s no secret that companies hoard data, amassing large quantities of it while seemingly prioritizing volume over value without a clear strategy in mind. According to a report by Veritas Data Genomics, 83% of IT decision-makers surveyed confessed to their company’s data hoarding, a fact made worse by a Forbes survey that found that 95% of companies identify the management of unstructured data as a challenge for their operations.
This indiscriminate data collection is not only making extracting meaningful insights from data difficult, leaving organizations with large piles of data garbage, but also incurs rising costs.
Data, whether created by people or AI, is a commodity, and the strategy of ‘collecting all data’ has been widely embraced in the technology sector and various other industries since the early 2010’s. We’re on track to create 180 zettabytes of data globally by 2025, and so collecting, storing, networking, and using data is only going to get more difficult.
Data creation has been significantly influenced by a combination of factors. The COVID-19 pandemic, for instance, required a shift towards remote work and e-commerce, a transition that has led to an exponential increase in data generation as businesses have had to digitize operations and customer interactions, leading to a surge in data. To illustrate this surge, a 2020 LinkedIn Pulse report states that every single person generates 1.7MB of data every second.
Similarly, the emergence of large language models that power generative AI applications has added a new dimension to data creation, considering the incredibly large number of data sets required to train them. For instance, OpenAI’s 4th version of its popular ChatGPT application was trained on datasets containing 4 trillion parameters, up from “only” several billions used to train its 3rd version. As these AI models become more popular, the volume of data they produce will only continue to grow.
All things considered, the amount of data created isn’t the only problem. Veritas research from 2021 reports that 85% of enterprise data isn’t business-critical, and 50% of overall data is either redundant or obsolete. So, let’s face it – not all data is good data, and a large portion of it is garbage.
In addition to the ‘garbage’ challenge, organizations that collect large amounts of data seem to create ‘data silos’. These silos emerge due to various reasons such as structural limitations of software applications, political dynamics within organizations, growth through acquisitions leading to incompatible systems, and vendor lock-in strategies. Silos not only increase costs but also hinder the company’s transition to becoming data-driven. Covanta, a company specializing in sustainable waste-to-energy solutions, did away with their data silos by centralizing all their data in a cloud accessible to every department in the company. This not only made their data more actionable, but also allowed them to collect, share and work with their enterprise data better, resulting in a 10% annual reduction in data maintenance costs.
Another common challenge is deriving insights from non-actionable data. Research firm Gartner’s survey indicates that organizations attribute an average annual loss of $15 million to poor data quality. So, it’s important to note that flawed data can lead to bitter consequences. For example, Unity Technologies, known for its popular real-time 3-D content platform, experienced a significant data quality incident in Q1 2022. Their Audience Pinpoint tool, designed to aid game developers in targeted player acquisition and advertising, ingested bad data from a large customer. This caused major inaccuracies in the training sets for its predictive ML algorithms, leading to a dip in performance. As a result, Unity’s revenue-sharing model took a direct hit, leading to a loss of approximately $110 million.
The fast-paced creation of data, driven by the misguided strategy that more data equals more value, has led us to a landscape littered with data garbage and silos. This creates a lot of noise that makes extracting meaningful business insights from the data difficult, resulting in a need to rethink and re-strategize the approach to data.
Companies that wish to tune out the noise are advised to evaluate the purpose of their data collection before, and not after, data is collected, and to tie this data to actionable business objectives. We also recommend centralizing data collection in a single hub, so as to ensure data is not kept in silos where alternate versions of the truth influence the decisions of different departments within a single organization.
In the end, those who come out on top are those who can leverage their data to inform business decisions, and not those who simply hoard the most of it.
If data connectivity is an airport, data gravity threatens to shut it down. Here is emerging tech to keep it open.
Essential AI resources - engineers, computing power, storage & networking - are costly. Any workaround?