noun_Email_707352 noun_917542_cc Map point Play Untitled Retweet Group 3 Fill 1

Data as an Enabler part 2: Are you ignoring 80% of your data assets?

Organizations will not find their way to become data-driven unless they dive deep into their unique business data journey, discusses Reija Nurmeksela in the 2nd blog in the Data as an enabler series.

Reija Nurmeksela / November 30, 2022

Typically, data and automation initiatives have focused largely on data stored in databases of various business systems.

However, according to IDC (source 1) only 20% of global data is predicted to live in databases as structured data in 2025. The rest, 80% of the data, resides outside of databases as semi-structured such as messages and unstructured such as documents.

In all parts of organizations people send emails, use business applications, create contracts and tons of other documents, order goods and services, pay invoices. They browse websites and use mobile apps and machines that are connected to Internet for sending data to the service providers and manufacturers. And as a result, we create structured, semi-structured and unstructured data. It forms a complex data network, a business data web.

Recognize and automate

Unlocking the value of unstructured and semi-structured data would give you a holistic insight of your business environment, most importantly about customers and their needs. In addition, recognizing this data would give a great steppingstone to automate your business activities.

Automating the handling of unstructured data and converting parts of it into structured data would reduce routine tasks and at the same time, increase efficiency in business activities making the everyday work life of people even smoother.

Let’s take an example. Interactions with customers and stakeholders are often in unstructured or semi-structured forms. For example, your customer or business partner contacts you with an email or sends you a document. It is common that customer name, document type and other data about the business transaction are copied manually from emails and documents and pasted into business systems.

We can improve digital information supply chains, like above, with a variety of solutions that help us to utilize all forms of data in various business and data domains. Modern automation and content service solutions, data platforms, and recent development in computational linguistics, machine learning and AI provide better technology base to utilize all forms of data in large scale. As a result, traditionally separated categories of structured and unstructured data – latter known better as “content” – are gradually colliding and uniting.

And when data moves, there is also semi-structured data that creates a transfer framework for the data. Thus, there is a need to build organizational competencies that understand the intersection of various forms of data.

 

At the end of year 2021, there were close to five billion global internet users (source 2) connected to the web of data. The volume of data produced in the world is practically doubling every 18 months. It is estimated that there will be 180 zettabytes of data in 2025 (source 3). In addition, the variety of data is expanding: the sources and users of data are heterogeneous and calling out to focus increasingly on all forms of data.

Find your own path to data-driven business

So, how can your organization become data-driven – an organization where decision-making is based on structured, semi-structured and unstructured data and where the automatic flow between structured, semi-structured and unstructured data is a natural part of everyday work of people and systems?

Here are my three tips:

1 Secure your structured data

It is said that data is the world’s most valuable and the same time most vulnerable resource. Structured databases are large collections of your critical data enabling your business. Interests to harm and monetize your data makes them so vulnerable.

  • Identify the most critical and purposeful data for your business, and how the data is inserted and used in your business activities. These are the most critical data activities in your business.
  • Focus on your critical data items that overlap between, not only numerous databases of business systems, but also unstructured documents and semi-structured messages. These should be in focus when thinking about automating your data flows in digital information supply chains.
  • Assess your data storages and processing environment from privacy and cybersecurity point of views, particularly to guarantee that only authorized persons and systems can add, modify and delete the data.
2 Give up of the A4 tradition of documents and automate content creation

Organizations are digitalizing and transforming their document-centric processes and workflows towards AI-supported orchestration and decisioning. IDC predicts that by 2026, 75% of the largest enterprises in the world have completely digitalized their unstructured data processes (source 4).

  • Identify work processes that rely heavily on unstructured documents and integrate document ingestion solutions with business systems. These are the unstructured data activities where you should reduce paper-based work and reliance on manual efforts.
  • Rethink whether the form of digital business information for human use could be something more dynamic than unstructured PDF document in A4 format. Could it be visualized as dashboards or visually flexible combinations of text and images on modern websites and mobile apps? Turning your unstructured data to semi-structured gives better user experiences on devices such as smart phones.
  • Content of business documents consists of structured data from databases, reused content and new content. Gartner has predicted that by 2024, 80% of new documents and correspondence will contain recycled content and content added via auto-completion (source 5). Connecting structured and unstructured data and automated document creation reduce routine tasks and decreases errors caused by manual data copying.
3 Move your semi-structured data fast and securely

The emergence of websites, mobile applications and IoT has multiplied the amount of semi-structured data. Nowadays, vast amount of data pipelines extract, transfer, and load data for reporting and real-time data stream.

  • Identify the data streams that are critical to building responsive business processes and rich customer experiences in real-time. Is the data moving fast enough? What if the data would be available for data analysis and other needs with less motion as it is in the modern cloud data platforms? The more you move your data from one location to another, the harder it becomes to secure, protect and guarantee the integrity of your data.
  • Identify business transactions where the same data is duplicated or multiplied for several flows. For example, as long as digital or even paper documents are used in business transactions, same data flows between business applications and systems in more than one form and format, such as semi-structured JSON parallel to unstructured PDF or image. Useless data transfer is not only a security issue but affects negatively to energy consumption.

How about cooperation between business and IT people?

All data forms have their own special characteristics, which require specific expertise of business and use context as well as technical means and tools. Knowing these characteristics is important also for businesspeople because business not only owns the data, but also the flow of data between different data forms in business activities.

So, the connection between structured, semi-structured and unstructured data always requires cooperation between data, IT and business. It is great to see that many businesspeople are busy building their data literacy skills and better ability to “think data”.

Diving deep into your unique business data journey might still be difficult, particularly as it requires tight cooperation between technology and business-oriented people and a common language over business and data that everybody understands.

If you need advice, business and data consultants can guide you and your organization to become data-driven. In my work, I help customers to identify connections, critical data items and flows between different data forms and various business systems when modernizing data handling in business processes.

If you like to remember one key takeaway about building cohesive base for decision-making from all forms of data, it is this:

“Don’t try to separate data and information and content. They are becoming so integrated and so ingrained in one another. There’s really a lot of overlap, so don’t try too hard to separate out structured and unstructured information because really, it’s all just data.”

- Expert Panel Perspectives
AIIM - The Association for Intelligent Information Management (source 6)

 

The ABC of structured, semi-structured and unstructured data

In the following, I will highlight some of the most common business applications and go through key features of various data forms.

Your business is based on technological systems supporting your and your customers´ business activities, such as:

  • CRM for your customer relationship management,
  • PIM for managing your products,
  • ERP for planning your enterprise resources wisely,
  • SCM for supporting delivery chain to your customers,
  • CM for managing cases and documents in content intensive businesses,
  • CMS for managing content on web sites and content marketing,
  • DM and RM for supporting life cycle of your documents and business records,
  • Accounting systems for tracking income and expenses,
  • HRM for supporting human resource activities,
    as well as many more advanced systems to support various business needs

Let´s dive a bit deeper to the data forms that are living inside and between these various data domains, systems, and applications.


Structured data: 
This is data for business transactions that
  • locates in relational databases of operative business applications, such as CRM, PIM, ERP, SCM, or CM, in on-premises or cloud environments,
  • follows predefined data models and schemas, and thus are quite straightforward to create, utilize and analyse. However, in my experience data level variations in data definitions and values between different business applications must be considered seriously in practical data projects,
  • consists of clearly defined data types with patterns that should make them easily searchable with various tools and technologies,
  • is typically machine (software application) generated data,
  • has in many cases long life cycle but the life cycle varies a lot from temporary to permanent.
Unstructured data: This means documents and content created as business records or for other human interaction. The data
  • may locate in file folders, non-relational databases, as blob entities (Binary Large Object) in relational and other databases, web sites, social media platforms,
  • is technically defined as data in absolute raw form e. binary data,
  • is not structured via predefined data models or schema, but has an internal structure that requires cognitive capabilities (human or AI) for understanding of that structure,
  • is mainly human generated data, but generation by software and artificial intelligence applications is increasing,
  • requires advanced search capabilities and proper software application for visual or auditory presentation of the data,
  • varies a lot according to its life cycle, from temporary to permanent.
  • in many cases is redundant, for instance, consists of many versions, presentation formats, copies, and variants.
Semi-structured data: This is sensor data, application logs, messages, excel sheets and structured content for data interchange such as Json and xml files that
  • may locate in IoT streams, file folders, non-relational databases, as blob entities (Binary Large Object) in relational databases,
  • is structured via predefined syntax but in many cases may lack predefined schema,
  • is mainly machine generated data, but also humans may generate structured content for example when content authoring is based on structured (xml) documents
  • typically requires proper software application for data creation and processing
  • has typically short and temporary life cycle except in case of structured (xml) documents.

 

Data as an enabler blog series

In this blog series, me and my colleagues provide perspectives to understand data as an enabler for digital business. We hope to give something interesting to ponder, helping you to conquer challenges and find new opportunities in data. How to identify what purposeful data for your business is? What capabilities and tools are needed to capitalize on it? How to get data, IT and businesspeople to cooperate? How to derive value from your organization's most valuable, yet too often neglected asset, data? Stay tuned!

 

We at Tietoevry Create are experts in breaking information silos and bridging the gap between strategy and implementation. Our services span from business advisory and data architecture design to agile data development and operations ensuring value creation from data.

If you want to get your data in order and say goodbye to data silos, do not hesitate to reach out.

Our team is ready to help!

Reija Nurmeksela
Tietoevry alumni

Reija is experienced in improving data, information and content supply chains in complex environments. She helps her customers to rethink and automate data-intensive business processes and often works as a translator between the business and technology. It is necessary for her to find purposeful solutions to business problems that can be solved with data.

Share on Facebook Tweet Share on LinkedIn