Skip to main content

GROUP BY: Grouping Rows for Aggregation

GROUP BY collapses multiple rows that share the same value in a column into a single summary row. It is always used alongside aggregate functions like COUNT, SUM, or AVG.

Prior lessons

This lesson uses concepts from SELECT, FROM, and WHERE. Aggregate functions (COUNT, SUM, etc.) are covered in detail in the Aggregate Functions lesson.

Syntax

SELECT column, AGGREGATE_FUNCTION(other_column)
FROM table_name
WHERE condition -- optional, applied before grouping
GROUP BY column;

Query execution order

Understanding when GROUP BY runs helps avoid common errors:

  1. FROM - load the table
  2. WHERE - filter individual rows
  3. GROUP BY - collapse rows into groups
  4. SELECT - compute output columns (including aggregates)
  5. ORDER BY - sort the result

Examples

Example 1: Count orders per customer

SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Result:

customer_idorder_count
13
22
32
42
51

Example 2: Total revenue by status

SELECT
status,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY status;

Result:

statusorder_counttotal_revenue
completed63292.93
pending31073.85
cancelled1349.00

Example 3: Group by multiple columns

SELECT
p.category,
o.status,
COUNT(*) AS order_count
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category, o.status;

Result:

categorystatusorder_count
Electronicscompleted4
Electronicspending1
Furniturecancelled1
Furniturecompleted2
Stationerypending2

Example 4: Using positional references in GROUP BY

Instead of repeating column names, you can reference them by their position in the SELECT list. 1 means the first column, 2 means the second, and so on.

SELECT
category,
COUNT(*) AS product_count,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY 1; -- 1 refers to category, the first column in SELECT

Result:

categoryproduct_countcheapestmost_expensive
Electronics229.99999.00
Furniture2199.00349.00
Stationery14.994.99

Positional references also work with multiple columns: GROUP BY 1, 2 groups by the first and second columns. This shorthand is common in practice but can reduce readability when the query is long or shared with others.

Common Mistake

Every non-aggregated column in SELECT must appear in GROUP BY

If a column appears in SELECT but is not wrapped in an aggregate function, it must also be listed in GROUP BY. Forgetting this causes an error.

-- This fails in most databases
SELECT
customer_id,
order_date, -- not aggregated, not in GROUP BY
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

-- Fix: either add order_date to GROUP BY or remove it from SELECT
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;

Practice

Write a query that counts the number of orders and calculates the total revenue for each product_id.

Show answer
SELECT
product_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_revenue
FROM orders
GROUP BY product_id;

Expected result:

product_idorder_counttotal_revenue
132997.00
41796.00
32698.00
22149.95
5274.85
Donate to this project