Data Integration is a discipline comprising practices, architectural techniques, and tools that allow organizations to ingest, transform, combine and provision data accross various data types.

— Gartner

Reminder: Data Integration is combining data from multiple sources into a single view of data

I. Usage scenarios and Includes

Usage scenarios:

  • Data consistency across applications
  • Master data management
  • Data sharing between enterprises
  • Data migration and consolidation

Includes:

  • Accessing, queueing, or extracting data from operational systems
  • Transforming and merging extracted data either logically or physically
  • Data quality and governance
  • Delivering data through an integrated approach for analytics purposes

II. Data Integration Workflows

Data Pipelines, Data Integration Tools and ETL

  • We can see the data pipelines wraps all the process, as it covers the entire the journey of data from the source to the destination.
  • Data Integration combines disparate data into a unified view of the data, and ETL can be a process of the data integration.

III. Capabilities of a Data Integration Platform

  • Pre-built connectors and adapters
  • Open-source architecture
  • Optimization for both batch processing of large-scale data and continuous data streams, or both
  • Integration with Big Data sources
  • Additional functionalities for data quality and gorvenance compliance, and security
  • Portability between on-premise and different types of cloud environments

IV. Tools

  • IBM InfoSphere Information Server, Cloudpak
  • Talend Data Integration tools
  • SAP, Oracle, Microsoft also offer data integration tools.