Design Pattern for Data Integration
In the context of data warehousing, data engineering or related disciplines, each data integration process initially looks unique. Specific data structures in sources and targets are connected via a sequence of transformations, everything looks individual at first.
If you look at a large number of such data integration processes from a distance, a few patterns often become visible. Many projects have certainly already made this observation. But how can we benefit from this?
If you look at a large number of such data integration processes from a distance, a few patterns often become visible. Many projects have certainly already made this observation. But how can we benefit from this?
In the context of data warehousing, data engineering or related disciplines, each data integration process initially looks unique. Specific data structures in sources and targets are connected via a sequence of transformations, everything looks individual at first.
If you look at a large number of such data integration processes from a distance, a few patterns often become visible. Many projects have certainly already made this observation. But how can we benefit from this?
If you look at a large number of such data integration processes from a distance, a few patterns often become visible. Many projects have certainly already made this observation. But how can we benefit from this?
Metadata Normalization
If we take a closer look at the data integration processes, we notice that they have some things in common. Basic logics and transformations are the same. So is the data modeling used (e.g., Data Vault, Dimensional Modeling). On the other hand, some parts are very specific, especially the processed data structures differ between each process.
In a metadata normalization, one separates between the common and the specific parts of the processes. The common parts are implemented in a design pattern. The specific parts are extracted from each process as so-called instance metadata.
In a metadata normalization, one separates between the common and the specific parts of the processes. The common parts are implemented in a design pattern. The specific parts are extracted from each process as so-called instance metadata.
Design Pattern
A pattern is first designed, implemented and tested as an abstract prototype. This is done either in a data integration tool or as an SQL/Spark template. The prototype is then imported into MetaKraftwerk and supplemented with dynamic rules that define where instance metadata should later be inserted and used. The pattern also defines naming rules and other dynamic components, such as DDL templates. When developing a pattern, there are no limits to creativity, and the functionality is completely based on the needs of the customer project.
Instance Metadata
Instance Metadata
Instance metadata is derived from data models, source systems, specifications and other metadata. Unlike in simple data models, instance metadata is supplemented by so-called functional roles that assign them a function in the pattern. The schema of instance metadata is flexibly definable so that it can be tailored to the technical requirements of the pattern.
Development Automation
MetaKraftwerk creates directly deployable and executable processes from the pattern and the instance metadata. The functionality already tested in the pattern is automatically transferred to concrete process instances. Automation accelerates development enormously and leads to a standardized quality of the processes created.
Learn how a pattern- and metadata-based development approach pays off for your data management project
Design Pattern Library
{
MetaKraftwerk has an extensive library of proven design patterns for a wide range of applications. Benefit directly from the quality and standardization of these patterns, or adapt them to your own needs.
}
Big Data
Patterns for poly- or unstructured data, which are designed for the high-performance processing of large data volumes. Data quality can be checked directly during processing, for example, so that mass data does not have to be moved several times unnecessarily. The integration of the data into the various layers of the Big Data platforms is realized via specific patterns. Patterns also exist for real-time Big Data architectures such as Lambda and Kappa.
Realtime-
Data Warehousing
Patterns for real-time data processing of streaming or messaging source systems. Real-time processing requires specialized processing logics for technical and business integrity checks as well as the direct integration of the data into the core layer of the data warehouse. This is also used in patterns for real-time big data architectures.
Mass Data Ingestion
Pattern for landing or moving large amounts of source data into the corresponding data platform. This is particularly useful for the rapid migration of data. This means that the move to the cloud, to an on-premise or even hybrid data platform can be carried out efficiently. The data is available as quickly as possible for data science and data analytics.
Data Quality
Various patterns that ensure the checking of data ranges, naming conventions, value ranges, minimum and/or maximum requirements for measures or cross references to other entities.
Big Data
Patterns for poly- or unstructured data, which are designed for the high-performance processing of large data volumes. Data quality can be checked directly during processing, for example, so that mass data does not have to be moved several times unnecessarily. The integration of the data into the various layers of the Big Data platforms is realized via specific patterns. Patterns also exist for real-time Big Data architectures such as Lambda and Kappa.
Realtime-
Data Warehousing
Patterns for real-time data processing of streaming or messaging source systems. Real-time processing requires specialized processing logics for technical and business integrity checks as well as the direct integration of the data into the core layer of the data warehouse. This is also used in patterns for real-time big data architectures.
Mass Data Ingestion
Pattern for landing or moving large amounts of source data into the corresponding data platform. This is particularly useful for the rapid migration of data. This means that the move to the cloud, to an on-premise or even hybrid data platform can be carried out efficiently. The data is available as quickly as possible for data science and data analytics.
Data Quality
Various patterns that ensure the checking of data ranges, naming conventions, value ranges, minimum and/or maximum requirements for measures or cross references to other entities.
Data Vault Modeling
Pattern for the technical implementation of data modeling according to Data Vault, as a compact historical data storage with model building blocks such as hubs, links and satellites. Especially suitable for use on hybrid and federated data platforms that require flexibility in terms of extensions and multi-source scenarios.
Anchor modeling
Pattern for data integration according to anchor modeling, with the goal of storage and access-oriented historization of data in the core layer of the data warehouse. Master data is stored in an object-state data structure with a fixed object core and variable attributes in state tables. Time-related data is stored directly in fact tables.
Multidimensional modeling
Pattern for providing data with the focus on data analysis in data marts or in the broader sense of Online Analytical Processing (OLAP). Multidimensional data spaces are created using measures and dimensions. Modeling can be done using star, snowflake, and galaxy schemas, among others. The transfer of these modeling types into the concrete technical artifacts is realized via different patterns.
Layer architectures in data platforms
Patterns for the transformation and integration into the different layers of the different architectures of data platforms. Layers can be landing, staging, cleaning, core, reporting and analysis. In data lake platforms, raw, landing, enrichment and consumption layers can be found. Patterns are used for the transfer to the layers. This can also include the transformation between modeling methods, e.g. from data vault to multidimensional modeling.
Data Vault Modeling
Pattern for the technical implementation of data modeling according to Data Vault, as a compact historical data storage with model building blocks such as hubs, links and satellites. Especially suitable for use on hybrid and federated data platforms that require flexibility in terms of extensions and multi-source scenarios.
Anchor modeling
Pattern for data integration according to anchor modeling, with the goal of storage and access-oriented historization of data in the core layer of the data warehouse. Master data is stored in an object-state data structure with a fixed object core and variable attributes in state tables. Time-related data is stored directly in fact tables.
Multidimensional modeling
Pattern for providing data with the focus on data analysis in data marts or in the broader sense of Online Analytical Processing (OLAP). Multidimensional data spaces are created using measures and dimensions. Modeling can be done using star, snowflake, and galaxy schemas, among others. The transfer of these modeling types into the concrete technical artifacts is realized via different patterns.
Layer architectures in data platforms
Patterns for the transformation and integration into the different layers of the different architectures of data platforms. Layers can be landing, staging, cleaning, core, reporting and analysis. In data lake platforms, raw, landing, enrichment and consumption layers can be found. Patterns are used for the transfer to the layers. This can also include the transformation between modeling methods, e.g. from data vault to multidimensional modeling.
Go for Design Pattern!
We will be happy to help you find individual solutions for your project and develop optimal design patterns according to your needs.