Blog

AZ Frame
/
Bez kategorii
/
Data management in the

Data management in the world of Machine Learning, part I

11.04.2022 | Ewa Suszek

Motivations

Today, we have the opportunity to improve and optimize virtually any process in any sector of our economy – based on data, on an unprecedented scale. Collecting, analyzing, identifying trends or correlations, warning, learning – these processes and tasks can be carried out at low cost compared to what was available over a decade ago.

We want to rely more on data when making business decisions.

Data-driven decision making only works when the data is ready to use, when you need to make a decision. This means that data development must be planned in advance for specific parts of the operating model and consistently managed and shared. When looking at the most successful global companies over the past decade, these are the ones that have been successful at using data to drive business. Data management took the forefront of the discussion. We realized that with the artificial intelligence there are new requirements in the use and management of data if we want to use machine learning mechanisms. We talk here about using them in a mass and automated manner.

We’ll take a look now at how one can improve the approach to data use and management. This may require an organizational change, but in the long run there is no going back. Organizations decide to change because what was not possible a decade ago – too much burden for too little return – is now supported by appropriate methods and tools. Today, most enterprises identify data management as a very important part of their strategy, but it happens so, because most frequently data mismanagement is risky (loss of business continuity, security breaches, GDPR non-compliance, etc.). Of course, these are good reasons to prioritize this, but today we are taking it a step further.

Today’s direction

Democratization of access and data models

Using data on a large scale requires greater supervision of the data. We talk here about the “democratization” of access and use of data throughout the enterprise and the implementation of tools that transfer data into the hands of many, not just a few, expert groups. This idea needs to be supported by process automation and ease of use. Democratization is currently observed not only in terms of access to data, but also in the area of creating data management models and artificial intelligence models built on the basis of data.

Data management based on cooperation between IT and business stakeholders

Why is it so important that data management is based on collaboration between IT and business? Because the skill sets for each of these management perspectives are different. Data managers will have expertise in data architecture, privacy, integration, and modeling. However, those on the information management side should be business experts — with the knowledge of: What is the data? Where do they come from? How and why is data valuable to the company? How can data be used in different business contexts? How should the data be ultimately used?

From Data to Information to Knowledge

Data quality is the foundation:

As we say in the industry: „garbage at the entrance equals garbage at the exit”. To achieve a higher degree of utilization and automation, let’s take care of the basics:

data is technically clean and integral;
data is accurate and at the same time important (we do not collect „garbage” and, at the same time, eliminate the „noise”);
data should be representative, useful for business purposes, from various (broadly understood) sources;
thoughtful periods of data collecting, but also appropriate and different resolutions / granularities.

Metadata:

Projects to modernize the data infrastructure and organize basic data needs, i.e. data acquisition, closing migration projects to the cloud, implementing data lake and configuring new BI tools, etc., released a lot of potential, but still requires being organized to avoid chaos;
One needs to build information and to answer contextual questions such as: “What does this column name or attribute actually mean?”, “What are the relationships between the individual fields?”, „What are the minimum, critical and maximum values?” etc.
While these questions are not new, there is a need for a systematic approach to metadata organization.

Quality profiling:

Data profiling is the process of viewing data to understand its content and structure, to check its quality, and determine how it will be used in the future.
Profiling can occur several times throughout the data asset lifecycle, from shallow to in-depth assessment. It includes the calculation of missing values, minima and maxima, median, critical (threshold) values, frequency distribution, and other key statistical indicators to help users understand the essential quality of the data.
Incorporating metadata as a context function into the data stack, enabling end users to understand and trust the information.

Data-driven companies, who are able to make better decisions on the basis of data, are now the most successful, thus they are valued and compete with other companies at the highest level. If you want to be successful, it is worth improving your approach to data use and management, taking into account today’s trends.