Skip to main content

Introduction to SQL

SQL (Structured Query Language) is the standard language for communicating with relational databases. Almost every data tool you'll encounter - data warehouses, BI platforms, dbt - speaks SQL under the hood.

What Is a Relational Database?

A relational database organizes data into tables (rows and columns), similar to a spreadsheet. Each table represents one type of entity - for example, orders, customers, or products. Tables are linked together through keys.

Key Concepts

Table: a collection of rows and columns. Each row is one record; each column is one attribute.

Primary key (PK): a column (or combination of columns) that uniquely identifies each row in a table. No two rows can share the same primary key value, and it can never be NULL.

Foreign key (FK): a column that references the primary key of another table. This is how tables are linked. For example, orders.customer_id holds the customer_id of the customer who placed that order.

Schema: the structure of a database: which tables exist, which columns each table has, and what data type each column holds (integer, text, date, etc.).

Query: a SQL statement that reads or manipulates data. Most queries you write as an analyst are SELECT statements that retrieve rows from one or more tables.

NULL: the absence of a value. Not zero, not an empty string, just nothing. NULL requires special handling in SQL and will come up throughout these lessons.

Sample Dataset

All lessons in this series use the same three tables from a fictional online store: customers, products, and orders. Every example and exercise is written against this data, so it's worth spending a minute getting familiar with it before diving into the lessons.

The tables are linked through foreign keys: orders.customer_id references customers.customer_id, and orders.product_id references products.product_id. This is what makes it a relational dataset: you can combine information across tables by matching those keys.

customers

Stores one row per registered customer. customer_id is the primary key.

customer_idnameemailcitycountry
1Alice Johnsonalice@example.comNew YorkUSA
2Bob Smithbob@example.comLondonUK
3Carol Whitecarol@example.comSydneyAustralia
4Dave Browndave@example.comTorontoCanada
5Eve Daviseve@example.comNew YorkUSA
6Frank Leefrank@example.comNULLUSA

products

Stores one row per product in the catalog. product_id is the primary key. Products span three categories: Electronics, Furniture, and Stationery.

product_idnamecategoryprice
1LaptopElectronics999.00
2MouseElectronics29.99
3DeskFurniture349.00
4ChairFurniture199.00
5NotebookStationery4.99

orders

Stores one row per order line. Each order belongs to one customer and covers one product. order_id is the primary key. customer_id and product_id are foreign keys back to the other two tables.

order_idcustomer_idproduct_idorder_datequantitytotal_amountstatus
1112024-01-051999.00completed
2122024-01-10259.98completed
3232024-01-151349.00completed
4342024-02-014796.00completed
5452024-02-101049.90NULL
6132024-03-011349.00cancelled
7212024-03-151999.00completed
8522024-04-01389.97completed
9312024-04-101999.00pending
10452024-05-01524.95pending

SQL Playground

Throughout this course you will use two interactive tools.

Playground (free-form): write and run any query against the dataset. There is no right or wrong answer. Use it to explore, experiment, and verify what you are learning. You will find a playground on most lesson pages.

Exercise (graded): found on exam pages. Same editor, but with a Submit button. It checks whether your query returns the exact rows and columns expected and tells you if you passed.

Both tools work fully in the browser with no setup required.

Parts of the playground

Schema toggle: the "Schema" button at the top reveals the table names and column definitions. Open it whenever you need a reminder of what columns are available.

Editor: the text area where you write SQL. You can write multi-line queries.

Run / Cmd+Enter: executes the query and shows results below.

Output table: displays the returned rows, the column headers, and a row count. NULL values appear styled differently from regular values.

Try it now. Run the default query, then modify it to explore the data. The playground is at the bottom of this page.

Donate to this project