A question that is frequently asked is “when should I use data virtualization and when should I use ETL tools?” Other variants of this question is “does data virtualization replace ETL?” or “I’ve already got ETL, why do I need data virtualization?” This Denodo Technologies architecture brief will answer these questions.
Extract, Transform, and Load (ETL) is a good solution for physical data consolidation projects which result in duplicating data from the original data sources into an enterprise data warehouse (EDW) or a new database.
This includes:
- ETL tools that are designed to bulk copy very large data sets, comprising millions of rows, from large structured data sources.
- Creating historical records of data, e.g. snapshots at a particular time, to analyze how the data set changes over time.
- Performing complex, multi-pass data transformation and cleansing operations, and bulk loading the data into a target data store.
The reality is that, while the two solutions are different, data virtualization and ETL are often complementary technologies. Data virtualization can extend and enhance ETL/EDW deployments in many ways, for example:
- Extending existing data warehouses with new data sources.
- Federating multiple data warehouses.
- Acting as a virtual data source to augment an ETL process.
- Isolating applications from changes to the underlying data sources (e.g. migrating a data warehouse.