GROUP BY: Grouping Rows for Aggregation
GROUP BY collapses multiple rows that share the same value in a column into a single summary row. It is always used alongside aggregate functions like COUNT, SUM, or AVG.
This lesson uses concepts from SELECT, FROM, and WHERE. Aggregate functions (COUNT, SUM, etc.) are covered in detail in the Aggregate Functions lesson.
Syntax
SELECT column, AGGREGATE_FUNCTION(other_column)
FROM table_name
WHERE condition -- optional, applied before grouping
GROUP BY column;
Query execution order
Understanding when GROUP BY runs helps avoid common errors:
FROM- load the tableWHERE- filter individual rowsGROUP BY- collapse rows into groupsSELECT- compute output columns (including aggregates)ORDER BY- sort the result
Examples
Example 1: Count orders per customer
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Result:
| customer_id | order_count |
|---|---|
| 1 | 3 |
| 2 | 2 |
| 3 | 2 |
| 4 | 2 |
| 5 | 1 |
Example 2: Total revenue by status
SELECT
status,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY status;
Result:
| status | order_count | total_revenue |
|---|---|---|
| completed | 6 | 3292.93 |
| pending | 3 | 1073.85 |
| cancelled | 1 | 349.00 |
Example 3: Group by multiple columns
SELECT
p.category,
o.status,
COUNT(*) AS order_count
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category, o.status;
Result:
| category | status | order_count |
|---|---|---|
| Electronics | completed | 4 |
| Electronics | pending | 1 |
| Furniture | cancelled | 1 |
| Furniture | completed | 2 |
| Stationery | pending | 2 |
Example 4: Using positional references in GROUP BY
Instead of repeating column names, you can reference them by their position in the SELECT list. 1 means the first column, 2 means the second, and so on.
SELECT
category,
COUNT(*) AS product_count,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY 1; -- 1 refers to category, the first column in SELECT
Result:
| category | product_count | cheapest | most_expensive |
|---|---|---|---|
| Electronics | 2 | 29.99 | 999.00 |
| Furniture | 2 | 199.00 | 349.00 |
| Stationery | 1 | 4.99 | 4.99 |
Positional references also work with multiple columns: GROUP BY 1, 2 groups by the first and second columns. This shorthand is common in practice but can reduce readability when the query is long or shared with others.
Common Mistake
If a column appears in SELECT but is not wrapped in an aggregate function, it must also be listed in GROUP BY. Forgetting this causes an error.
-- This fails in most databases
SELECT
customer_id,
order_date, -- not aggregated, not in GROUP BY
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
-- Fix: either add order_date to GROUP BY or remove it from SELECT
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
Practice
Write a query that counts the number of orders and calculates the total revenue for each product_id.
Show answer
SELECT
product_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY product_id;
Expected result:
| product_id | order_count | total_revenue |
|---|---|---|
| 1 | 3 | 2997.00 |
| 4 | 1 | 796.00 |
| 3 | 2 | 698.00 |
| 2 | 2 | 149.95 |
| 5 | 2 | 74.85 |