Data Integration is a discipline comprising practices, architectural techniques, and tools that allow organizations to ingest, transform, combine and provision data accross various data types.
— Gartner
Reminder: Data Integration is combining data from multiple sources into a single view of data
I. Usage scenarios and Includes
Usage scenarios:
- Data consistency across applications
- Master data management
- Data sharing between enterprises
- Data migration and consolidation
Includes:
- Accessing, queueing, or extracting data from operational systems
- Transforming and merging extracted data either logically or physically
- Data quality and governance
- Delivering data through an integrated approach for analytics purposes
II. Data Integration Workflows
- We can see the data pipelines wraps all the process, as it covers the entire the journey of data from the source to the destination.
- Data Integration combines disparate data into a unified view of the data, and ETL can be a process of the data integration.
III. Capabilities of a Data Integration Platform
- Pre-built connectors and adapters
- Open-source architecture
- Optimization for both batch processing of large-scale data and continuous data streams, or both
- Integration with Big Data sources
- Additional functionalities for data quality and gorvenance compliance, and security
- Portability between on-premise and different types of cloud environments
IV. Tools
- IBM InfoSphere Information Server, Cloudpak
- Talend Data Integration tools
- SAP, Oracle, Microsoft also offer data integration tools.