What Does ETL Stand For?

A non-developer guide to extract, transform, load.

5 min read

Feb 24, 2022

The end goal of the ETL process is to have a useful set of data that can be analyzed. Thru the extraction process, we have data, but right now it’s unlikely to be helpful. This is because data is often “dirty.” It can have typos, missing fields, empty rows, etc. This problem is only exacerbated when you bring together data from multiple sources.

Before we run any analysis, we need to ensure our data is clean by transforming it into a useful format. For instance, we may want to filter out customers missing email addresses or integrate our Shopify data with our Stripe data. These transformations prepare the data for analysis.

‍

Transform = clean the data

‍

➡️ Load

We’re now in a good spot. We have a bunch of data in a useful format. The final thing we need to do is store it somewhere so we can analyze it. The most common place to store this data is in a “data warehouse.” If you’ve ever heard of a company called Snowflake, this is what they do. In the final step of ETL, we load the transformed data into a data warehouse.

‍

Load = store the data

‍

Voilà! That’s the basic idea behind ETL. Not too bad, right?

‍

Further Exploration

This post is just an introduction to the world of ETL. If you’re interested in going a bit further, here’s a few resources and things to look out for:

‍

Further Reading

Justin Gage of Technically.dev, wrote a (subscriber’s only) post called What’s ETL?. It goes deeper than this article while remaining very friendly to non-developers. He also wrote a similarly fantastic post about data warehouses.

‍

Common ETL Tools

Developers can write ETL jobs, but it’s becoming increasingly popular to use third-party tools to handle this process. The most common ones today include:

‍

ETL vs ELT

ELT and ETL are very similar, except with ELT, you load data into the warehouse before transforming it. There are reasons why this ordering can be advantageous, but for purposes of our understanding, they serve the same function.

Check out Panoply’s post ETL vs ELT: The Difference is in the How if you want the details.

‍

Reverse ETL

Now that you’re familiar with the term “ETL,” you may start to see “Reverse ETL” thrown around. Reverse ETL is the process of taking the data that was just ETL’d into your data warehouse and sending it back out to your SaaS apps.

Here’s the thinking: if your most accurate data now lives in your data warehouse, you want that data available to everyone. For example, suppose you previously ETL’d your Stripe data into your warehouse. You can now reverse ETL that data into Zendesk, so your customer support reps have access to the latest financial data.

Popular reverse ETL tools include: