In recent years, there has been a creeping evolution of data, which if you weren’t paying attention, might have snuck past you. That is the transition from unstructured to structured data.
This is why I enjoy being directly involved in the progression of technology over the long term. You really gain an appreciation for the steady march forward of each element of the overall capacity for information technology. Obviously, speeds and feeds get a lot of focus in the media, as they’re more tangible, but speeds and feeds are worth nothing without good software and information management.
With the advent of the internet, and further, imaging technology such as cameras and scanners, data generation has exploded. The vast majority of that data is unstructured. Meaning, it is not consistently indexed, beyond systems individuals would put in place for themselves, such as a folder hierarchy. Those one-off systems are of little value at scale. The same can be said for huge amounts of business data.
IBM Reports: “It is estimated that 80% of the world’s data is unstructured, but businesses are only able to gain visibility into a portion of that data. Innovative companies are using data to enhance their value proposition and increase customer satisfaction. However, because it is hard to understand and find meaning in data that is text-heavy, companies have a difficult time creating insights that could ultimately shape decisions that are made within a company.”
Website data is among the most practical example I like to share. It used to be that websites each selected unique ways to list their address and operating hours. Since every site did this in their own way, there was no ability to parse and index all locations. It was just part of the glut of unstructured internet data. I’m sure savvy web developers could find a way to scrape the addresses, but it would have been extremely difficult to do so accurately at a massive scale.
Then, along came Google Maps (Google Places), and created a structured database of places and addresses. Meaning, everyone recorded their address and hours in a standardized data table. This critical change is what let’s you ask the Google Assistant for directions or when a place will close. This structured data served as a foundation for a huge number of truly practical applications.
Even in your personal data, such as photos, there is now an intelligent metadata layer (when using Google or iCloud Photos) that has freed us from manually creating albums. Using computer vision and structured tagging such as location/geotagging, paired with the place data from Google Maps for example, you automatically have a highly structured photo inventory.
In the business world, it’s important to recognize and differentiate electronic structured vs. electronic unstructured data. Choosing to scan documents to go paperless is one step in the right direction. However, it is not the only step. You must incorporate structure into that data. In the example of filing of scanned or PDF documents, ensure you have sufficient metadata. Scanning and dropping files into a network share is only one small step removed from a paper filing cabinet, and I wouldn’t recommend it. For one thing, it likely won’t have enough security and audit trail to ensure you don’t lose data.
In your organization, where possible, avoid unstructured data. If you create a team shared folder, accept that it is going to become an unmanaged mess over time. It is a guaranteed outcome. There is value in having an unstructured location for collaboration and work-in-progress content, but I do recommend making it clear that is the purpose of the space.
For final copies of work, I strongly recommend a proper document repository with security, audit trails, structured metadata and optical character recognition. Today, this requires some investment of dedicated resources to manage the information, but it is necessary to generate and maintain the corporate memory.
Keep your eye on the best and brightest in tech. They’re all working furiously to figure out how to harness the growing beast that is Big Data. There is simply too much for any of us to wield without some algorithmic assistance.