Deriving Business Metrics via SQL: The First-timer
From the shoes of an analyst
Picture this - you've just joined an eCommerce company as a data analyst. Your boss greets you on your first day at work, either face-to-face or virtually. He is happy that he has just got the company's first analyst - you, and is keen to get you up to speed with the pulse of the business.
Being the first analyst for the company means setting up the company's first report or dashboard.
This includes communicating key performance indicators (KPIs) regularly.
There is an existing database of tables stored on Google BigQuery for you an analyst to extract important numbers.
"But how do I extract those numbers?", you ask.
We are here to help.
What you will able to do after reading this article:
- Access Google BigQuery
- Write simple SQL (Structured Query Language, pronounced as either "sequel" or "S Q L") queries
- Retrieve the most important KPI with SQL
Accessing Google BigQuery
First off, ensure you have an exising Google account like a Gmail account.
Sign in using that account of yours at https://console.cloud.google.com/bigquery .
Give a pat on your back if you see this image below once you're signed in.
Click on `Compose A New Query` in your SQL workspace.
As a start, copy the SQL query below and paste it into your SQL workspace.
SELECT *
FROM `bigquery-public-data.thelook_ecommerce.orders`
LIMIT 5;
Write simple SQL queries
Let's formalise our understanding of SQL queries.
We start with the most important template or structure of a SQL query, shown below.
SELECT <insert field names>
; (semi-colon to indicate end of query)
FROM <insert table name>
LIMIT <insert maximum number of rows to be shown>
Basically, we are telling the database to go into that specific table (after the FROM clause), extract all fields in that table (after the SELECT clause), but only showing a maximum number of records or rows (after the LIMIT clause).
With any table we work with for the first time, I suggest doing these tasks below using SQL to better understand the data stored in the table `bigquery-public-data.thelook_ecommerce.orders`
:
- Have a look at how the data is stored
- SQL template: SELECT * FROM <insert table name> LIMIT 5;
- Determine the number of categories for important fields, e.g. order status, type of customers, product type, country
- SQL template: SELECT DISTINCT <insert field name> FROM <insert table name>;
- Find out the date range covered in that table, e.g. order date, registration date. This is important to know if you're having the latest data (i.e. data is up to most recent date)
- SQL template: SELECT MIN(<insert date field>), MAX(<insert same date field>) FROM <insert table name>
We did the first task above, so let's continue with tasks 2 and 3.
For task 2, we are interested to see the different order types we have for our eCommerce company. The query below will help us answer the question.
SELECT DISTINCT status
FROM `bigquery-public-data.thelook_ecommerce.orders`;
Running the query above gives the output below.
status |
Shipped |
Complete |
Returned |
Cancelled |
Processing |
SELECT MIN(created_at) AS first_order_date, MAX(created_at) AS last_order_date
FROM `bigquery-public-data.thelook_ecommerce.orders`;
Depending on when we run the query above, as data may get updated (where new records are added), our output may be different. This is what I got at the time of writing this article (notice that the values below are in date-time format):
first_order_date | last_order_date |
2019-01-11 10:32:00 UTC | 2023-10-28 19:50:39.668163 UTC |
Retrieve the most important KPI with SQL
As an analyst at this eCommerce company, or with any company in fact, we need to know what is that one metric (KPI) the company talks about each day, to measure its performance.
With an eCommerce company, the latest total orders is a sure bet to be its north star metric.
Let's end off this article by dissecting the query below to get our KPI - latest total orders.
Before diving into writing our SQL, as an analyst, the key challenge we face often is translating business questions into technical ones.
Here's the business question - What are our latest order numbers?
Clearly in SQL, we cannot just copy-and-paste that question into the database, and run it. It ain't ChatGPT!
So, we are going to rephrase the question into a technical one, one that a SQL database understands.
"Order numbers" refer to number of orders made (let's stick to this definition for now), excluding cancelled or returned orders.
In SQL, this would translate into getting total number of unique order IDs, while excluding (using the WHERE
clause and NOT IN
clause) specific order statuses, i.e. cancelled or returned.
We will have the query below as a result (since order ID is unique for every record in the table, using the COUNT
function suffices instead of adding the DISTINCT
clause, since order ID is a primary key):
SELECT COUNT(order_id) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.orders`
WHERE status NOT IN ('Cancelled', 'Returned')
Not forgetting "latest", where it basically implies we look at perhaps the previous day's orders. Let us assume for this eCommerce company, latest would mean the day before yesterday, i.e. previous two days (to avoid the situation when BigQuery has yet to add data from the previous day as of the current day and time).
In SQL then, we add a date filter (after the WHERE
clause) to look at orders made on the previous two days, like the one below:
WHERE DATE(created_at) = DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY)
We see some functions (functions receive inputs to produce an output) used above:
DATE
: Here, we are converting the type of field (input) to a date format (output), without timestampCURRENT_DATE
: This gives us today's date (output)DATE_ADD
: We are telling our SQL database to add (or subtract) X number of days (or weeks, months, years) from a specific date (input) using theINTERVAL
clause
For more information on BigQuery's functions, head over to their documentation here.
Eventually, we end up with the SQL query below to extract our KPI which answers the business question ("What are our latest order numbers?"):
SELECT COUNT(order_id) AS total_orders
FROM `bigquery-public-data.thelook_ecommerce.orders`
WHERE status NOT IN ('Cancelled', 'Returned')
AND DATE(created_at) = DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY)
Running the query above gives us the following output (output will differ depending on when we run the query),
total_orders |
1261 |
This means we have clinched 1,261 orders two days ago.
Is this considered an insight?
Yes, only if we are able to determine if the number 1,261 is good or bad. We shall discuss more on this in the next article in this series.
And yes, if we are also able to break this number down into different categories such as sales by region, sales by product etc. This will help us determine our top-performing products, high-value customer segments etc.
Conclusion
Now that we have learnt the needful to get started on our SQL journey as an analyst, we shall experience what it takes to be a relevant analyst which means focusing more on generating actionable insights, instead of merely pulling numbers from the database.
The articles after this will be essential in transforming us into a rockstar analyst.
Stay tuned by subscribing to our mailing list so you don't miss out on our new exciting articles when they're released!