Skip to content
Photo by Myriam Jessier on Unsplash

Deriving Business Metrics via SQL: The First-timer

From the shoes of an analyst

Picture this - you've just joined an eCommerce company as a data analyst. Your boss greets you on your first day at work, either face-to-face or virtually. He is happy that he has just got the company's first analyst - you, and is keen to get you up to speed with the pulse of the business.

Being the first analyst for the company means setting up the company's first report or dashboard.

This includes communicating key performance indicators (KPIs) regularly.

There is an existing database of tables stored on Google BigQuery for you an analyst to extract important numbers.

"But how do I extract those numbers?", you ask.

We are here to help.

What you will able to do after reading this article:

  • Access Google BigQuery
  • Write simple SQL (Structured Query Language, pronounced as either "sequel" or "S Q L") queries
  • Retrieve the most important KPI with SQL

Accessing Google BigQuery

First off, ensure you have an exising Google account like a Gmail account.

Sign in using that account of yours at https://console.cloud.google.com/bigquery .

Give a pat on your back if you see this image below once you're signed in.

bigquery_1

Click on `Compose A New Query` in your SQL workspace.

As a start, copy the SQL query below and paste it into your SQL workspace.

SELECT *
FROM `bigquery-public-data.thelook_ecommerce.orders`
LIMIT 5;

 Go ahead and run the query either by clicking on `Run` or highlighting the query and hit Ctrl and Enter buttons on your keyboard at the same time (if you're on PC). 
 
Give another pat on the back if you are able to see the image below as a result.
 
Screenshot (25)
 
From here, we see that SQL is the language we use to communicate with databases.
 
Like a spoken human language, we need to be aware of how we structure our sentences (in this case, SQL queries).
 
So what the query did was to return just 5 rows for all fields (or columns) from a specific table.
 

Write simple SQL queries

Let's formalise our understanding of SQL queries.

We start with the most important template or structure of a SQL query, shown below.

SELECT <insert field names>
FROM <insert table name>
LIMIT <insert maximum number of rows to be shown>
; (semi-colon to indicate end of query)

Basically, we are telling the database to go into that specific table (after the FROM clause), extract all fields in that table (after the SELECT clause), but only showing a maximum number of records or rows (after the LIMIT clause).

With any table we work with for the first time, I suggest doing these tasks below using SQL to better understand the data stored in the table `bigquery-public-data.thelook_ecommerce.orders`:

  1. Have a look at how the data is stored
    • SQL template: SELECT * FROM <insert table name> LIMIT 5;
  2. Determine the number of categories for important fields, e.g. order status, type of customers, product type, country
    • SQL template: SELECT DISTINCT <insert field name> FROM <insert table name>;
  3. Find out the date range covered in that table, e.g. order date, registration date. This is important to know if you're having the latest data (i.e. data is up to most recent date)
    • SQL template: SELECT MIN(<insert date field>), MAX(<insert same date field>) FROM <insert table name>

We did the first task above, so let's continue with tasks 2 and 3.

For task 2, we are interested to see the different order types we have for our eCommerce company. The query below will help us answer the question.

SELECT DISTINCT status
FROM `bigquery-public-data.thelook_ecommerce.orders`;

Running the query above gives the output below.

status
Shipped
Complete
Returned
Cancelled
Processing
Notice that we have cancelled and returned orders, which we might need to exclude later on when deriving metrics such as total orders or total sales.
 
For task 3, we need to know if we have orders updated into the table as of the previous day. Run the query below to see if we have the latest data.
 

SELECT MIN(created_at) AS first_order_date, MAX(created_at) AS last_order_date
FROM `bigquery-public-data.thelook_ecommerce.orders`;

Depending on when we run the query above, as data may get updated (where new records are added), our output may be different. This is what I got at the time of writing this article (notice that the values below are in date-time format):

first_order_date last_order_date
2019-01-11 10:32:00 UTC 2023-10-28 19:50:39.668163 UTC

Retrieve the most important KPI with SQL

As an analyst at this eCommerce company, or with any company in fact, we need to know what is that one metric (KPI) the company talks about each day, to measure its performance.

With an eCommerce company, the latest total orders is a sure bet to be its north star metric.

Let's end off this article by dissecting the query below to get our KPI - latest total orders.

Before diving into writing our SQL, as an analyst, the key challenge we face often is translating business questions into technical ones.

Here's the business question - What are our latest order numbers?

Clearly in SQL, we cannot just copy-and-paste that question into the database, and run it. It ain't ChatGPT!

So, we are going to rephrase the question into a technical one, one that a SQL database understands.

"Order numbers" refer to number of orders made (let's stick to this definition for now), excluding cancelled or returned orders.

In SQL, this would translate into getting total number of unique order IDs, while excluding (using the WHERE clause and NOT IN clause) specific order statuses, i.e. cancelled or returned.

We will have the query below as a result (since order ID is unique for every record in the table, using the COUNT function suffices instead of adding the DISTINCT clause, since order ID is a primary key):

SELECT COUNT(order_id) AS total_sales
FROM `bigquery-public-data.thelook_ecommerce.orders`
WHERE status NOT IN ('Cancelled', 'Returned')

 

Not forgetting "latest", where it basically implies we look at perhaps the previous day's orders. Let us assume for this eCommerce company, latest would mean the day before yesterday, i.e. previous two days (to avoid the situation when BigQuery has yet to add data from the previous day as of the current day and time).

In SQL then, we add a date filter (after the WHERE clause) to look at orders made on the previous two days, like the one below:

WHERE DATE(created_at) = DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY)

We see some functions (functions receive inputs to produce an output) used above:

  • DATE: Here, we are converting the type of field (input) to a date format (output), without timestamp
  • CURRENT_DATE: This gives us today's date (output)
  • DATE_ADD: We are telling our SQL database to add (or subtract) X number of days (or weeks, months, years) from a specific date (input) using the INTERVAL clause

For more information on BigQuery's functions, head over to their documentation here.

Eventually, we end up with the SQL query below to extract our KPI which answers the business question ("What are our latest order numbers?"):

SELECT COUNT(order_id) AS total_orders
FROM `bigquery-public-data.thelook_ecommerce.orders`
WHERE status NOT IN ('Cancelled', 'Returned')
AND DATE(created_at) = DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY)

 

Running the query above gives us the following output (output will differ depending on when we run the query),

total_orders
1261

This means we have clinched 1,261 orders two days ago.

Is this considered an insight?

Yes, only if we are able to determine if the number 1,261 is good or bad. We shall discuss more on this in the next article in this series.

And yes, if we are also able to break this number down into different categories such as sales by region, sales by product etc. This will help us determine our top-performing products, high-value customer segments etc.

Conclusion

Now that we have learnt the needful to get started on our SQL journey as an analyst, we shall experience what it takes to be a relevant analyst which means focusing more on generating actionable insights, instead of merely pulling numbers from the database.

The articles after this will be essential in transforming us into a rockstar analyst.

Stay tuned by subscribing to our mailing list so you don't miss out on our new exciting articles when they're released!

Subscribe to stay tuned!