Blog


Data management - data profiling
20.07.2021

After cataloguing the data sources, for each of them the key data should be defined. Key information will be determined by the function of their own environment. E.g., for MS Excel it will be the xls sheets; for the relations data bases – tables; for the www websites – html files, etc. The loss of the sources and the key data included there may also comprise the PDF files, and – as such – act as a mainstay for further works.

What sets apart Precisely tools, is the possibility of defining and managing the relations between the data. The contextuality of the information becomes increasingly significant in the modern world, as there are more and more data mutually connected. Thanks to the solutions, based on graph databases, we are able to store and manage the information also with regard to the connections between themselves. Thanks to the contextuality we can observe the influence of changes within a given piece of data to the world around; the change of a record results in interactions with all nodes, with which this data has a direct or indirect relationships These functionalities are implemented e.g in the Data Lineage module.

In case of structural change, the model is subject to re-construction.

Current positioning of the Precisely Spectrum tools in the context approach is shown on the below Forrester Wave diagram:

Data standardization

The processes simultaneous to the inventory works are the ones related to defining the data standards. Thus, the standards of formatting (date, phone number identification numbers etc.) should be set, as well as other validatory rules (proper names should always be quoted in the nominative form, without diminutives and others).

As mentioned above, this stage is put down to the given business competence, as only the direct data users know, which data are most essential. The same persons should define the required formats and standards, plus future validation rules. IT personnel should help in identifying the sources and determining connections that result from the database structures. Due to the immense importance of this stage, AZ Frame offers extensive support with defining key data processes. A very important aspect of Data Management in an organisation is establishing a cohesive standards for the data. In this case, referential sets of data may come in handy, which allow for the description of the objects (e.g. client) in a consistent and unambiguous way.

Examples of definitions of standards:

  • phone number should consist of a prefix (“„+” sign and the country code; for Poland it is “„+48”) and the individual number (in Poland it is 9 digits, both for mobile and fixed line phones). If the prefix is different than “„+48”, any kind of maximum 4-digit number can be put. Between the digits there are no spaces,
  • name of a street in Poland must respond to the Teryt dictionary,
  • people’s first names should reflect the official names dictionary. No diminutives are accepted.

When using the Glossary module, we can also set our own standards and rules, governing the data.

Within the system, the API is also available – when using it, we may manage the referential sets of data (updating, additions, downloading a list or a set).

Data modelling

The data modelling module is available the Precisely Discovery. We can create physical models, based on predefined connections (reflecting the data structures in the sources, which is made automatically) and logical models (based on the entity and relational schemes), which organize the data resources in an organisation. For instance, in the target logical model, one diagram describing a client and one diagram describing the client’s addresses should be included.

In the source data bases of an organisation (and their respective models), a few diagrams, storing the client’s data and several more diagrams storing the addresses may have been created. Modelling serves the purpose of projecting the source data to the target model.

By means of the logical data model we can visualise the data and make use of the federation functionality. These models allow for connecting with numerous data sources and systems, thus ensuring the view of our data in real time, without the necessity of separating them out from the sources.

Both „physical” and „logical” models can be subject to the data profiling processes. „Logical” models can be also made available to other applications and systems.

 

Click to rate this post!
[Total: 0 Average: 0]
Autor: Ewa Suszek
Back to list