What Does ETL Stand For?
A non-developer guide to extract, transform, load.
ETL stands for extract, transform, load.
Together, these processes create a data pipeline that moves data from internal systems into a data warehouse.
^Now, what the heck does that mean?
If you’re not a developer, don’t feel bad if you’re overwhelmed by all the jargon. When I was first introduced to ETL, this confused me too and made me think the concepts were complex. While implementing data pipelines can be tricky, the underlying concept of ETL is actually not complicated.
Whether or not you’re a developer, ETL is important to understand since it's used by so many companies. This post explains the concept of ETL in layman’s terms so anyone can understand it (even if you’re a no-coder like me 🙂).
ETL exists within the context of data engineering as a whole. Companies today generate tons of data. Every click in an app, email in Zendesk, or payment in Stripe is a row of data. Someone realized that if we collect all this data, we can analyze it to generate helpful insights.
For example, it's good to know how many users churned or what percentage of support requests go unanswered. A large part of a data engineer’s job today is wrangling all this data so that we can find these answers. This is where ETL can help.
ETL is the process of moving data from the place it’s generated (eg. your app) to a place where it can be analyzed (eg. your data warehouse). As the acronym suggests, there are three distinct components of ETL that occur sequentially. Let’s explore each one by one.
By default, most data is siloed. It lives in all the tools and databases your company uses, like Stripe or Airtable. The first step in ETL is getting this data out of these sources by extracting it from each one. This data pull usually happens on a defined schedule, like once every hour.
Extract = get the data
The end goal of the ETL process is to have a useful set of data that can be analyzed. Thru the extraction process, we have data, but right now it’s unlikely to be helpful. This is because data is often “dirty.” It can have typos, missing fields, empty rows, etc. This problem is only exacerbated when you bring together data from multiple sources.
Before we run any analysis, we need to ensure our data is clean by transforming it into a useful format. For instance, we may want to filter out customers missing email addresses or integrate our Shopify data with our Stripe data. These transformations prepare the data for analysis.
Transform = clean the data
We’re now in a good spot. We have a bunch of data in a useful format. The final thing we need to do is store it somewhere so we can analyze it. The most common place to store this data is in a “data warehouse.” If you’ve ever heard of a company called Snowflake, this is what they do. In the final step of ETL, we load the transformed data into a data warehouse.
Load = store the data
Voilà! That’s the basic idea behind ETL. Not too bad, right?
This post is just an introduction to the world of ETL. If you’re interested in going a bit further, here’s a few resources and things to look out for:
Justin Gage of Technically.dev, wrote a (subscriber’s only) post called What’s ETL?. It goes deeper than this article while remaining very friendly to non-developers. He also wrote a similarly fantastic post about data warehouses.
Common ETL Tools
Developers can write ETL jobs, but it’s becoming increasingly popular to use third-party tools to handle this process. The most common ones today include:
ETL vs ELT
ELT and ETL are very similar, except with ELT, you load data into the warehouse before transforming it. There are reasons why this ordering can be advantageous, but for purposes of our understanding, they serve the same function.
Check out Panoply’s post ETL vs ELT: The Difference is in the How if you want the details.
Now that you’re familiar with the term “ETL,” you may start to see “Reverse ETL” thrown around. Reverse ETL is the process of taking the data that was just ETL’d into your data warehouse and sending it back out to your SaaS apps.
Here’s the thinking: if your most accurate data now lives in your data warehouse, you want that data available to everyone. For example, suppose you previously ETL’d your Stripe data into your warehouse. You can now reverse ETL that data into Zendesk, so your customer support reps have access to the latest financial data.
Popular reverse ETL tools include:
Stay up to date with the latest no-code data news, strategies, and insights sent straight to your inbox!
We keep your data in sync, so you can focus on your product.