共计 7474 个字符,预计需要花费 19 分钟才能阅读完成。
- JOIN 深水区(避免倍增 / 全外连接替代)
- 子查询 vs CTE(WITH)与常见排行榜写法
- 窗口函数(分组内排名、滚动与累积)
- 事务与隔离级别(InnoDB 锁的正确打开方式)
- 索引优化(联合索引、覆盖索引、函数索引)
- EXPLAIN / EXPLAIN ANALYZE 解读
- 分页与大表查询优化(Keyset Pagination)
- 规范化 vs 反规范化(取舍)
- 小型数据集扩充脚本
- 练习与参考答案(可直接运行)
继续使用第一课中的
shop数据库与四张表(users / products / orders / order_items)。以下示例默认 MySQL 8.0。
1) JOIN 深水区
1.1 避免倍增:先聚合再连接
反例: 直接把 orders 与 order_items JOIN 后聚合,很容易因为一单多明细导致金额倍增。
-- 反例:GMV 可能被放大
SELECT u.user_id, u.name, SUM(o.amount) AS revenue
FROM users u
JOIN orders o ON u.user_id = o.user_id
JOIN order_items i ON o.order_id = i.order_id
GROUP BY u.user_id, u.name;
正确写法: 在明细表先按订单汇总,再与订单或用户回连。
WITH item_gmv AS (SELECT order_id, SUM(quantity * unit_price) AS gmv
FROM order_items
GROUP BY order_id
)
SELECT u.user_id, u.name, COALESCE(SUM(g.gmv),0) AS revenue
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id AND o.status='paid'
LEFT JOIN item_gmv g ON o.order_id = g.order_id
GROUP BY u.user_id, u.name
ORDER BY revenue DESC;
1.2 半连接 / 反连接(找有 / 没有发生过的用户)
-- 有过下单的用户(半连接存在性)SELECT DISTINCT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id AND o.status='paid');
-- 从未下单的用户(反连接)SELECT u.*
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);
1.3 模拟 FULL OUTER JOIN(MySQL 无原生支持)
把左外连接结果与右外连接中“左表缺失”的部分合并:
-- 以用户与订单为例:SELECT u.user_id, u.name, o.order_id, o.amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
UNION ALL
SELECT u.user_id, u.name, o.order_id, o.amount
FROM orders o
LEFT JOIN users u ON u.user_id = o.user_id
WHERE u.user_id IS NULL; -- 右边多出来的
2) 子查询 vs CTE(WITH)与排行榜
2.1 CTE 让复杂查询“分步走”
WITH paid AS (SELECT * FROM orders WHERE status='paid'), user_rev AS (SELECT user_id, SUM(amount) AS revenue
FROM paid GROUP BY user_id
)
SELECT u.user_id, u.name, r.revenue
FROM users u
LEFT JOIN user_rev r USING(user_id)
ORDER BY r.revenue DESC;
2.2 相关子查询的坑与替代
相关子查询会对外层每行执行一次,容易 N×M;优先改写为 JOIN 或窗口。
-- 相关子查询:每个用户计算一次最大订单额(可能慢)SELECT u.user_id,
(SELECT MAX(o.amount) FROM orders o WHERE o.user_id=u.user_id) AS max_amt
FROM users u;
-- JOIN + GROUP BY 更友好
SELECT u.user_id, MAX(o.amount) AS max_amt
FROM users u LEFT JOIN orders o ON u.user_id=o.user_id
GROUP BY u.user_id;
2.3 排行榜:全局 TopN 与分组 TopN
全局 Top 10 订单
SELECT order_id, user_id, amount
FROM orders
WHERE status='paid'
ORDER BY amount DESC
LIMIT 10;
分组 Top 3(每个用户金额前三单) —— 窗口函数最佳:
WITH ranked AS (SELECT o.*, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY amount DESC) AS rn
FROM orders o WHERE status='paid'
)
SELECT * FROM ranked WHERE rn <= 3;
3) 窗口函数(Window Functions)
3.1 组内排名、累积与移动平均
-- 组内累计 GMV(按用户时间顺序)SELECT user_id, order_id, created_at, amount,
SUM(amount) OVER(
PARTITION BY user_id ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cum_amount
FROM orders WHERE status='paid';
-- 7 日滚动 GMV(自然日时间窗)SELECT user_id, created_at::date AS d, amount,
SUM(amount) OVER(
PARTITION BY user_id
ORDER BY created_at
RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW
) AS roll7
FROM orders WHERE status='paid';
提示:
RANGE ... INTERVAL n DAY表示按时间范围滚动;ROWS按“行数”滚动。
3.2 同比 / 环比(同一用户维度)
-- 按月 GMV + 与上月差值
WITH m AS (
SELECT user_id,
DATE_FORMAT(created_at,'%Y-%m') AS ym,
SUM(amount) AS gmv
FROM orders WHERE status='paid'
GROUP BY user_id, ym
)
SELECT *,
gmv - LAG(gmv,1,0) OVER(PARTITION BY user_id ORDER BY ym) AS mom_diff
FROM m;
4) 事务与隔离级别(InnoDB)
4.1 四要素回顾
- ACID:原子性、一致性、隔离性、持久性。
- MySQL 默认
autocommit=1。多步业务操作应START TRANSACTION ... COMMIT。
4.2 隔离级别与读现象
- InnoDB 默认 REPEATABLE READ,通过 Next-Key Locks(间隙 + 记录锁) 避免幻读。
READ COMMITTED减少锁竞争,但可能出现不可重复读;SERIALIZABLE最保守,吞吐低。
-- 查看或设置会话隔离级别
SELECT @@transaction_isolation; -- 或 @@tx_isolation
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- 按需切换
4.3 行级锁与一致性读
- 一致性读:普通
SELECT在 RR/RC 级别下走 MVCC,不加锁; - 锁定读:
SELECT ... FOR UPDATE(排他锁)/FOR SHARE(共享锁)用于下单减库存等场景。
START TRANSACTION;
SELECT stock FROM products WHERE product_id=3 FOR UPDATE; -- 锁住该行
UPDATE products SET stock = stock - 1 WHERE product_id=3 AND stock>0;
INSERT INTO orders(user_id,status,amount) VALUES (2,'paid',1699.00);
INSERT INTO order_items(order_id,product_id,quantity,unit_price)
VALUES (LAST_INSERT_ID(),3,1,1699.00);
COMMIT;
最佳实践:尽量走 索引精确定位(避免锁全表 / 大范围间隙),事务要短小,更新顺序一致避免死锁。
5) 索引优化
5.1 联合索引与左前缀法则
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- 可高效支持:-- WHERE user_id = ?
-- WHERE user_id = ? AND created_at >= ?
-- 但 ** 不能仅靠它 ** 高效支持 WHERE created_at >= ?
5.2 覆盖索引(索引即结果)
当 SELECT 的列全部包含在同一个二级索引中时,会显示 Extra=Using index,无需回表。
CREATE INDEX idx_orders_user_amount ON orders(user_id, amount);
EXPLAIN SELECT user_id, amount FROM orders WHERE user_id=1; -- 观察 Extra
5.3 函数索引(兼容两种写法)
A. 直接函数索引(8.0.13+)
CREATE INDEX idx_users_email_lower ON users((LOWER(email)));
-- 然后 WHERE LOWER(email)='alice@example.com' 可走索引
B. 生成列 + 索引(通用)
ALTER TABLE users ADD COLUMN email_l VARCHAR(255) GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_l ON users(email_l);
6) EXPLAIN / EXPLAIN ANALYZE 解读
EXPLAIN
SELECT o.order_id
FROM orders o
WHERE o.user_id=1 AND o.created_at>=DATE_SUB(NOW(), INTERVAL 30 DAY);
关注:
- type:访问类型(
const/ref/range/index/ALL)。ALL= 全表扫描,需要优化。 - key / key_len:命中的索引及长度。
- rows:预估扫描行数(越少越好)。
- Extra:
Using index(覆盖)、Using where(过滤)、Using filesort(无法用索引排序)等。
实际执行耗时(8.0.18+):
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id=1 ORDER BY created_at DESC LIMIT 20;
EXPLAIN不执行,仅估计;EXPLAIN ANALYZE真执行并给出每步耗时。
7) 分页与大表查询
7.1 经典 OFFSET 分页的问题
LIMIT 100000, 20 会扫描并丢弃前 10 万行,代价巨大。
7.2 Keyset Pagination(书签 / 游标法)
-- 第 1 页:SELECT order_id, user_id, amount
FROM orders
WHERE status='paid'
ORDER BY order_id
LIMIT 20;
-- 下一页:带上上一页最后一个 order_id 作为书签
SELECT order_id, user_id, amount
FROM orders
WHERE status='paid' AND order_id > :last_id
ORDER BY order_id
LIMIT 20;
依赖 稳定且可索引 的排序键(最好是自增主键或时间 + 主键联合索引)。
7.3 覆盖索引 + 回表按需
先用覆盖索引取主键,再回表拿大字段:
-- 覆盖索引阶段(轻量)SELECT order_id FROM orders FORCE INDEX(PRIMARY)
WHERE status='paid' AND order_id > :last_id
ORDER BY order_id LIMIT 20;
-- 回表阶段(IN 主键集合)SELECT * FROM orders WHERE order_id IN (...);
8) 规范化 vs 反规范化
- 3NF 规范化:减少冗余与更新异常;适合 OLTP(交易型)。
- 适度反规范化:为查询性能预计算 / 冗余(如订单表冗余
amount、或埋每日汇总表),以写入更复杂换取读取更快。 - 决策点:读写比例、热点查询形态、是否可通过索引 / 缓存解决。
9) 小型数据集扩充脚本(便于实验)
-- 追加一些订单与明细(跨多月)INSERT INTO orders (user_id,status,amount,created_at) VALUES
(1,'paid',199.00, DATE_SUB(NOW(), INTERVAL 40 DAY)),
(2,'paid',259.00, DATE_SUB(NOW(), INTERVAL 33 DAY)),
(3,'paid',1699.00,DATE_SUB(NOW(), INTERVAL 28 DAY)),
(1,'paid',399.00, DATE_SUB(NOW(), INTERVAL 18 DAY)),
(2,'paid',699.00, DATE_SUB(NOW(), INTERVAL 7 DAY)),
(1,'paid',129.00, DATE_SUB(NOW(), INTERVAL 2 DAY));
INSERT INTO order_items(order_id,product_id,quantity,unit_price) VALUES
(LAST_INSERT_ID()-5,1,1,199.00),
(LAST_INSERT_ID()-4,2,1,259.00),
(LAST_INSERT_ID()-3,3,1,1699.00),
(LAST_INSERT_ID()-2,1,1,399.00),
(LAST_INSERT_ID()-1,2,3,233.00),
(LAST_INSERT_ID(), 1,1,129.00);
提示:
LAST_INSERT_ID()在多值插入时仅返回第一行插入的自增值,故示例用相对偏移法演示。生产中请先查询订单 ID 再批量插明细,或在应用层一次性生成映射。
10) 练习与参考答案
练习 1: 统计“近 30 天每位用户的下单数、GMV、客单价(GMV/ 订单数)”,GMV 降序。
WITH recent AS (
SELECT * FROM orders
WHERE status='paid' AND created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
)
SELECT u.user_id, u.name,
COUNT(r.order_id) AS n_orders,
COALESCE(SUM(r.amount),0) AS gmv,
CASE WHEN COUNT(r.order_id)=0 THEN 0 ELSE ROUND(SUM(r.amount)/COUNT(r.order_id),2) END AS aov
FROM users u
LEFT JOIN recent r ON u.user_id=r.user_id
GROUP BY u.user_id, u.name
ORDER BY gmv DESC;
练习 2: 每个用户金额最高的两单及其累计金额。
WITH ranked AS (SELECT o.*, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY amount DESC) AS rn
FROM orders o WHERE status='paid'
)
SELECT user_id,
SUM(CASE WHEN rn<=2 THEN amount ELSE 0 END) AS top2_sum
FROM ranked
GROUP BY user_id
ORDER BY top2_sum DESC;
练习 3: 过去 7 天滚动 GMV 警戒:当日 GMV 与 7 日滚动中位数相比偏离超过 3×IQR 的日期。
-- MySQL 无原生 IQR 窗口函数,可用近似法:P75-P25 通过子查询取分位。-- 这里给出一个思路:先做日汇总,再在应用层计算 IQR 更稳妥。WITH daily AS (SELECT DATE(created_at) AS d, SUM(amount) AS gmv
FROM orders WHERE status='paid'
GROUP BY d
)
SELECT * FROM daily ORDER BY d;
-- 导出后在分析层(Python/R)用 IQR 判异。
练习 4: 为查询 WHERE user_id=? AND created_at>=? ORDER BY created_at 设计最优索引并用 EXPLAIN ANALYZE 验证。
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id=1 AND created_at>=DATE_SUB(NOW(), INTERVAL 90 DAY)
ORDER BY created_at;
练习 5: 实现 Keyset 分页 API 的 SQL 片段(上一页最后一条 order_id 作为游标)。
-- first page
SELECT order_id, user_id, amount
FROM orders WHERE status='paid'
ORDER BY order_id LIMIT 20;
-- next page
SELECT order_id, user_id, amount
FROM orders WHERE status='paid' AND order_id > :last_id
ORDER BY order_id LIMIT 20;
小结
- 复杂查询先“拆步走”(CTE),能读就能调。
- 以窗口函数替代相关子查询,实现分组排名、滚动与累计。
- 事务要短、索引要准、锁要小;EXPLAIN/ANALYZE 常用常新。
- 分页优先 Keyset,覆盖索引 + 回表降低 IO。

