Dark Data: Why You Need to Know About it

Successful companies thoroughly understand their customers, market, and competitors. A proven and popular path to these insights is through analyses of large amounts of collected data, otherwise known as big data.

Organizations typically gather big data both intentionally and as a side-effect of normal business operations. Big data analytics can help businesses identify opportunities for improvement, see customer behavioral patterns, and assess the results of strategy implementation.

Yet, many companies have no interest in, nor the ability to analyze most of the vast amounts of data they end up. According to a 2015 IBM news release, an estimated 90% of all data generated by sensor devices—i.e. tablets, smartphones—and analog-to-digital conversions is never analyzed nor used. 

Organizations often keep this data to meet compliance regulations or with the hopes of using it in the future. This untapped information is called dark data.

Read on to learn what dark data is, how it could impact your organization, and what steps you can take to manage it.

What is Dark Data?

Gartner’s IT Glossary defines dark data as unused information collected and stored by organizations through their everyday business activities. This type of data typically remains unutilized for purposes such as analytics, direct monetization, or business relationships due to its inaccessibility to traditional database tools. Dark data is generally in an unstructured, poorly labeled, and unworkable format.

In some cases, an organization may not even be aware they’re collecting dark data. Types of dark or legacy data can include untapped internal data, nontraditional unstructured data—attached to audio, video, image files—and data hidden behind firewalls, such as deep web data. Other technological sources include:

  • Vehicular informatics
  • Telecommunications 
  • Wireless communications
  • Global navigation satellite systems
  • In-transit data from network transactions
  • Industrial networks like sensor machines and devices

Examples of dark data from these sources could include:

  • Raw survey data
  • Geo-location data
  • Financial statements
  • Customer call records
  • Email correspondences
  • Previous employee data
  • Surveillance video footage
  • Log files from servers and systems
  • Old notes, documents, and presentations

Companies aren’t the only ones who collect dark data. When downloading documents from the internet, many people unwittingly accumulate dark data due to inadequate data storage protocols like labeling and filing. 

Any downloaded document not clearly labeled will likely pile up on your computer or in the cloud, unused and unfindable in the future. These documents become a part of your personal dark data.

Why Should I Care About Dark Data?

There are numerous reasons for you to care about dark data that can become even more pressing as time passes. 

Data protection laws

In 2018, the European Union introduced groundbreaking legislation entitled The European Union’s General Data Protection Regulation (GDPR). This legislation restricts data collection practices and usage. Similar regulations followed in 2020 through the California Consumer Privacy Act (CCPA) and the General Data Protection Law (LGPD) from Brazil.

These privacy regulations, introduced to protect consumer data and limit the sale of personal information, present new challenges to businesses that collect data. Non-compliance with these regulations could result in steep fines. Privacy infractions can harm a company’s reputation and its chance for success. Thus, it’s essential that data-collection businesses familiarize themselves with these new regulations to ensure their practices are compliant. 

Many internet users are familiar with either the opt-in or opt-out choices now available on most websites. Consumers can also now request access to any data collected over the past year, including personal data that has ended up as dark data. Unstructured or inaccessible information could yield legal troubles for companies.

Internal security

While dark data is typically unorganized and unstructured, it may also contain sensitive, proprietary information that could be dangerous in the wrong hands. With data breaches becoming more common, companies who don’t organize or safeguard their dark data could be vulnerable to serious security risks. If a data breach includes consumers’ personal data, it can expose them to security risks and potential identity fraud.

Lost opportunities 

Effective data analysis

Dark data that is not accessed will limit an organization’s ability to produce the most useful data analysis. Analytics tools produce the highest quality of data analysis when they have access to complete data. The lack of access to dark data limits the pool of analyzable information. Additionally, the 2015 aforementioned IBM report notes that as much as 60% of dark data begins to lose value immediately after its generation.

Untapped data potential

Organizations without skilled data analysts or budgets for third-party service providers may be missing out on opportunities to leverage the untapped potential of this information. The extraction and analytics tools available can be expensive and require skilled and knowledgeable personnel to manage.

Dark data analysis may reveal nuanced and valuable customer, business, and operational insights that structured data currently in your control may not disclose. These insights could provide more in-depth knowledge of some of the following areas:

  • How long customers stay on a web page
  • At what point customers typically exit a web page
  • How consumers interact with loyalty programs
  • Customer feedback through call-in records
  • What affects consumer behavior and spending trends
  • What affects investment trends
  • When customers are likely to contact a business through a support channel
  • Network security and activity patterns
  • Traffic patterns from mobile geo-location data

As your competitors take advantage of their previously unleveraged data, you may encounter lost revenue opportunities or a decrease in your market share, unless you do the same. 

In today’s competitive market, data is currency. The size of dark data alone is a critical source of knowledge with the potential for enhanced business operations. By expanding the amount of data analyzed, organizations can leverage new innovations to create competitive advantages. If businesses fail to optimize new forms of data in the current digital age, they risk falling behind compared to competitors.

Storage space

One of the more costly concerns of dark data is the low return on investment in storage space. As your unorganized data grows, it requires storage that you could otherwise use for more accessible information. According to an article by the New York Times, data storage centers waste 90% of their energy on dark data. More significant storage needs mean higher overhead costs, which is already a concern in most organizations.

What Should You Do with Your Dark Data?

If you’re wondering what your business should do to protect and optimize your data while remaining compliant with privacy regulations, be sure to explore our suggestions below.

Not all dark data is created equal. Depending on your industry, some of it was never valuable, and over half is likely to lose value quickly. The other half is likely unstructured, unformatted, and unlabeled, rendering it difficult to access. These characteristics present unique challenges to companies interested in investing in dark data extraction and analytics tools.

According to a survey conducted by Computer Weekly, 60% of organizations have inadequate business intelligence reporting capabilities. 65% of those surveyed confirmed their content management approaches lack organization.

Companies can take steps to better manage their dark data, and prepare it for analysis. They can also apply the following steps to incoming data:

  1. Regularly audit and cull your database. This pruning will require staff to structure or assign categories and labels to old data, making it more accessible. 
  2. Apply strong encryption standards to your data, including in-house server data and that which is in cloud storage. 
  3. Create safe disposal and data retention policies aligned with the National Institute of Standards and Technology Guidelines for Media Sanitation. Policies should clearly identify criteria for data erasure and retention. 

Use advanced technologies to optimize dark data’s value

For many companies, adapting unstructured data into comprehensible assets involves lengthy processes that are mostly manual and generally cost-prohibitive. For a more optimal use of resources, companies need to automate this process. 

Advances in technologies such as computer vision, cognitive analytics, and pattern recognition make this more readily available, at least to large corporations willing and able to invest in the tools and skilled employees necessary. These tools can make it easier to process and explore unstructured dark data.

Machine learning

As a form of artificial intelligence (AI) application, machine learning is one of these analytical tools. This tool allows systems to quickly learn and complete tasks—such as continuously running computer programs—in a fraction of the time it would take to do the same job manually.

In the case of dark data, machine learning can build data observation models that look for patterns. If operating correctly, such a system will alert users to exceptions, with the option for them to address or ignore the alert. The system learns from users’ reactions and will automatically offer a similar solution the next time such an event occurs. 

Machine learning can play a vital role in helping businesses discover unused information and insights otherwise overlooked. These insights can help organizations make more informed decisions about their incoming data. They can also guide them toward practical steps to take in response to their data.

Implementing machine learning systems will require internal structural changes for businesses, which can be costly in both time and money. For many companies, the benefits will be a high return on investment.

Data visualization

Less costly tools than machine learning exist, such as data visualization technologies. These tools work to connect and present all of your data sources on a single dashboard that can provide real-time visibility to your compiled data. Businesses can leverage this kind of tool to sort through their dark data and discover otherwise unused, yet valuable, information.

Conclusions

Many businesses agree that unused dark data represents lost opportunities. But for many of these businesses, accessing, understanding, and utilizing this data can present a daunting challenge. Investment in the new processes, labor, and technologies required can present many financial constraints. Companies that have moved toward big data collection but fail to exploit dark data miss out on many advantages that could dictate their success.

By investing in new employees with advanced skills and technologies such as machine learning, businesses have the potential to combine structured and unstructured data to generate valuable results. 

Unleashing the benefits of dark data may allow organizations to gain knowledge and insights that will enable them to yield competitive advantages and increase their bottom line.

Mateus Oliveira

Mateus Oliveira

Leave a comment

Share

Share on facebook
Share on whatsapp
Share on twitter
Share on linkedin
Share on email