This stage involves physically connecting with the data sources, discovering data models and verifying the data types, which are stored in the columns. Here we can identify:
- searching the proper sets of data,
- defining the semantics per model.
A feature that sets apart the Discovery platform from other ones is the possibility of defining the semantics, i.e. defining the syntax of the specific data. For instance, we can define the conditions for discovering the data type “„ID Card” (3 capitals, 6 digits as a card number; personal identification number – 11 digits, where the third and fifth one is always 0, 1, 2 or 3 etc.).
The system boasts huge possibilities of defining the semantic methods, thanks to usage of the Groovy language. This is a script language, modelled according to the JAVA syntax, which – in the end – means simplicity, a large number of people able to use it for programming and gives great possibilities of creating the rules within the Precisely Spectrum environment.
Thanks to the data discovering functionality, the system is able to recognize data structures and types in a defined source, using pre-defined semantics for proper discovery of types. Thanks to these mechanisms we can track down the columns (in diagrams or files), in which exists e.g. the personal identification numbers, names, surnames, e-mail addresses, phone numbers, dates etc. If the system finds a string of signs, consisting of 3 letters and 6 digits, it will indicate that in a given column the ID Card numbers may be found. If a string of signs that starts with “„+48” and is followed by 9 digits, is found, it will indicate that it may refer to phone numbers, valid in Poland. If the semantic types are not defined, the system will not subsume the data to the “„ID Card” or “„Polish phone numbers” categories.
After discovering the data, we can tag the respective diagrams and columns. This functionality allows for searching all boxes marked by different tags (e.g. second name can be tagged as „client’s data”, „name”, „second name”, „personal data” etc.).
After the verification stage is completed, we have the connections to all pre-defined columns of the data sources and we know the data structures in these sources, and we know the overall content of the columns.