MySQL 入门教程· 第二课

44次阅读
没有评论

共计 7474 个字符,预计需要花费 19 分钟才能阅读完成。

  1. JOIN 深水区(避免倍增 / 全外连接替代)
  2. 子查询 vs CTE(WITH)与常见排行榜写法
  3. 窗口函数(分组内排名、滚动与累积)
  4. 事务与隔离级别(InnoDB 锁的正确打开方式)
  5. 索引优化(联合索引、覆盖索引、函数索引)
  6. EXPLAIN / EXPLAIN ANALYZE 解读
  7. 分页与大表查询优化(Keyset Pagination)
  8. 规范化 vs 反规范化(取舍)
  9. 小型数据集扩充脚本
  10. 练习与参考答案(可直接运行)

继续使用第一课中的 shop 数据库与四张表(users / products / orders / order_items)。以下示例默认 MySQL 8.0。


1) JOIN 深水区

1.1 避免倍增:先聚合再连接

反例: 直接把 ordersorder_items JOIN 后聚合,很容易因为一单多明细导致金额倍增。

-- 反例:GMV 可能被放大
SELECT u.user_id, u.name, SUM(o.amount) AS revenue
FROM users u
JOIN orders o ON u.user_id = o.user_id
JOIN order_items i ON o.order_id = i.order_id
GROUP BY u.user_id, u.name;

正确写法: 在明细表先按订单汇总,再与订单或用户回连。

WITH item_gmv AS (SELECT order_id, SUM(quantity * unit_price) AS gmv
  FROM order_items
  GROUP BY order_id
)
SELECT u.user_id, u.name, COALESCE(SUM(g.gmv),0) AS revenue
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id AND o.status='paid'
LEFT JOIN item_gmv g ON o.order_id = g.order_id
GROUP BY u.user_id, u.name
ORDER BY revenue DESC;

1.2 半连接 / 反连接(找有 / 没有发生过的用户)

-- 有过下单的用户(半连接存在性)SELECT DISTINCT u.*
FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id AND o.status='paid');

-- 从未下单的用户(反连接)SELECT u.*
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.user_id);

1.3 模拟 FULL OUTER JOIN(MySQL 无原生支持)

把左外连接结果与右外连接中“左表缺失”的部分合并:

-- 以用户与订单为例:SELECT u.user_id, u.name, o.order_id, o.amount
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
UNION ALL
SELECT u.user_id, u.name, o.order_id, o.amount
FROM orders o
LEFT JOIN users u ON u.user_id = o.user_id
WHERE u.user_id IS NULL;  -- 右边多出来的

2) 子查询 vs CTE(WITH)与排行榜

2.1 CTE 让复杂查询“分步走”

WITH paid AS (SELECT * FROM orders WHERE status='paid'), user_rev AS (SELECT user_id, SUM(amount) AS revenue
  FROM paid GROUP BY user_id
)
SELECT u.user_id, u.name, r.revenue
FROM users u
LEFT JOIN user_rev r USING(user_id)
ORDER BY r.revenue DESC;

2.2 相关子查询的坑与替代

相关子查询会对外层每行执行一次,容易 N×M;优先改写为 JOIN 或窗口。

-- 相关子查询:每个用户计算一次最大订单额(可能慢)SELECT u.user_id,
       (SELECT MAX(o.amount) FROM orders o WHERE o.user_id=u.user_id) AS max_amt
FROM users u;

-- JOIN + GROUP BY 更友好
SELECT u.user_id, MAX(o.amount) AS max_amt
FROM users u LEFT JOIN orders o ON u.user_id=o.user_id
GROUP BY u.user_id;

2.3 排行榜:全局 TopN 与分组 TopN

全局 Top 10 订单

SELECT order_id, user_id, amount
FROM orders
WHERE status='paid'
ORDER BY amount DESC
LIMIT 10;

分组 Top 3(每个用户金额前三单) —— 窗口函数最佳:

WITH ranked AS (SELECT o.*, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY amount DESC) AS rn
  FROM orders o WHERE status='paid'
)
SELECT * FROM ranked WHERE rn <= 3;

3) 窗口函数(Window Functions)

3.1 组内排名、累积与移动平均

-- 组内累计 GMV(按用户时间顺序)SELECT user_id, order_id, created_at, amount,
       SUM(amount) OVER(
         PARTITION BY user_id ORDER BY created_at
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cum_amount
FROM orders WHERE status='paid';

-- 7 日滚动 GMV(自然日时间窗)SELECT user_id, created_at::date AS d, amount,
       SUM(amount) OVER(
         PARTITION BY user_id
         ORDER BY created_at
         RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW
       ) AS roll7
FROM orders WHERE status='paid';

提示:RANGE ... INTERVAL n DAY 表示按时间范围滚动;ROWS 按“行数”滚动。

3.2 同比 / 环比(同一用户维度)

-- 按月 GMV + 与上月差值
WITH m AS (
  SELECT user_id,
         DATE_FORMAT(created_at,'%Y-%m') AS ym,
         SUM(amount) AS gmv
  FROM orders WHERE status='paid'
  GROUP BY user_id, ym
)
SELECT *,
       gmv - LAG(gmv,1,0) OVER(PARTITION BY user_id ORDER BY ym) AS mom_diff
FROM m;

4) 事务与隔离级别(InnoDB)

4.1 四要素回顾

  • ACID:原子性、一致性、隔离性、持久性。
  • MySQL 默认 autocommit=1。多步业务操作应 START TRANSACTION ... COMMIT

4.2 隔离级别与读现象

  • InnoDB 默认 REPEATABLE READ,通过 Next-Key Locks(间隙 + 记录锁) 避免幻读。
  • READ COMMITTED 减少锁竞争,但可能出现不可重复读;SERIALIZABLE 最保守,吞吐低。
-- 查看或设置会话隔离级别
SELECT @@transaction_isolation; -- 或 @@tx_isolation
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- 按需切换

4.3 行级锁与一致性读

  • 一致性读:普通 SELECT 在 RR/RC 级别下走 MVCC,不加锁;
  • 锁定读SELECT ... FOR UPDATE(排他锁)/FOR SHARE(共享锁)用于下单减库存等场景。
START TRANSACTION;
  SELECT stock FROM products WHERE product_id=3 FOR UPDATE; -- 锁住该行
  UPDATE products SET stock = stock - 1 WHERE product_id=3 AND stock>0;
  INSERT INTO orders(user_id,status,amount) VALUES (2,'paid',1699.00);
  INSERT INTO order_items(order_id,product_id,quantity,unit_price)
    VALUES (LAST_INSERT_ID(),3,1,1699.00);
COMMIT;

最佳实践:尽量走 索引精确定位(避免锁全表 / 大范围间隙),事务要短小,更新顺序一致避免死锁。


5) 索引优化

5.1 联合索引与左前缀法则

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
-- 可高效支持:-- WHERE user_id = ?
-- WHERE user_id = ? AND created_at >= ?
-- 但 ** 不能仅靠它 ** 高效支持 WHERE created_at >= ?

5.2 覆盖索引(索引即结果)

SELECT 的列全部包含在同一个二级索引中时,会显示 Extra=Using index,无需回表。

CREATE INDEX idx_orders_user_amount ON orders(user_id, amount);
EXPLAIN SELECT user_id, amount FROM orders WHERE user_id=1;  -- 观察 Extra

5.3 函数索引(兼容两种写法)

A. 直接函数索引(8.0.13+)

CREATE INDEX idx_users_email_lower ON users((LOWER(email)));
-- 然后 WHERE LOWER(email)='alice@example.com' 可走索引

B. 生成列 + 索引(通用)

ALTER TABLE users ADD COLUMN email_l VARCHAR(255) GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_l ON users(email_l);

6) EXPLAIN / EXPLAIN ANALYZE 解读

EXPLAIN
SELECT o.order_id
FROM orders o
WHERE o.user_id=1 AND o.created_at>=DATE_SUB(NOW(), INTERVAL 30 DAY);

关注:

  • type:访问类型(const/ref/range/index/ALL)。ALL= 全表扫描,需要优化。
  • key / key_len:命中的索引及长度。
  • rows:预估扫描行数(越少越好)。
  • ExtraUsing index(覆盖)、Using where(过滤)、Using filesort(无法用索引排序)等。

实际执行耗时(8.0.18+):

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id=1 ORDER BY created_at DESC LIMIT 20;

EXPLAIN 不执行,仅估计;EXPLAIN ANALYZE 真执行并给出每步耗时。


7) 分页与大表查询

7.1 经典 OFFSET 分页的问题

LIMIT 100000, 20 会扫描并丢弃前 10 万行,代价巨大。

7.2 Keyset Pagination(书签 / 游标法)

-- 第 1 页:SELECT order_id, user_id, amount
FROM orders
WHERE status='paid'
ORDER BY order_id
LIMIT 20;

-- 下一页:带上上一页最后一个 order_id 作为书签
SELECT order_id, user_id, amount
FROM orders
WHERE status='paid' AND order_id > :last_id
ORDER BY order_id
LIMIT 20;

依赖 稳定且可索引 的排序键(最好是自增主键或时间 + 主键联合索引)。

7.3 覆盖索引 + 回表按需

先用覆盖索引取主键,再回表拿大字段:

-- 覆盖索引阶段(轻量)SELECT order_id FROM orders FORCE INDEX(PRIMARY)
WHERE status='paid' AND order_id > :last_id
ORDER BY order_id LIMIT 20;
-- 回表阶段(IN 主键集合)SELECT * FROM orders WHERE order_id IN (...);

8) 规范化 vs 反规范化

  • 3NF 规范化:减少冗余与更新异常;适合 OLTP(交易型)。
  • 适度反规范化:为查询性能预计算 / 冗余(如订单表冗余 amount、或埋每日汇总表),以写入更复杂换取读取更快。
  • 决策点:读写比例、热点查询形态、是否可通过索引 / 缓存解决。

9) 小型数据集扩充脚本(便于实验)

-- 追加一些订单与明细(跨多月)INSERT INTO orders (user_id,status,amount,created_at) VALUES
 (1,'paid',199.00, DATE_SUB(NOW(), INTERVAL 40 DAY)),
 (2,'paid',259.00, DATE_SUB(NOW(), INTERVAL 33 DAY)),
 (3,'paid',1699.00,DATE_SUB(NOW(), INTERVAL 28 DAY)),
 (1,'paid',399.00, DATE_SUB(NOW(), INTERVAL 18 DAY)),
 (2,'paid',699.00, DATE_SUB(NOW(), INTERVAL 7 DAY)),
 (1,'paid',129.00, DATE_SUB(NOW(), INTERVAL 2 DAY));

INSERT INTO order_items(order_id,product_id,quantity,unit_price) VALUES
 (LAST_INSERT_ID()-5,1,1,199.00),
 (LAST_INSERT_ID()-4,2,1,259.00),
 (LAST_INSERT_ID()-3,3,1,1699.00),
 (LAST_INSERT_ID()-2,1,1,399.00),
 (LAST_INSERT_ID()-1,2,3,233.00),
 (LAST_INSERT_ID(), 1,1,129.00);

提示:LAST_INSERT_ID() 在多值插入时仅返回第一行插入的自增值,故示例用相对偏移法演示。生产中请先查询订单 ID 再批量插明细,或在应用层一次性生成映射。


10) 练习与参考答案

练习 1: 统计“近 30 天每位用户的下单数、GMV、客单价(GMV/ 订单数)”,GMV 降序。

WITH recent AS (
  SELECT * FROM orders
  WHERE status='paid' AND created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
)
SELECT u.user_id, u.name,
       COUNT(r.order_id) AS n_orders,
       COALESCE(SUM(r.amount),0) AS gmv,
       CASE WHEN COUNT(r.order_id)=0 THEN 0 ELSE ROUND(SUM(r.amount)/COUNT(r.order_id),2) END AS aov
FROM users u
LEFT JOIN recent r ON u.user_id=r.user_id
GROUP BY u.user_id, u.name
ORDER BY gmv DESC;

练习 2: 每个用户金额最高的两单及其累计金额。

WITH ranked AS (SELECT o.*, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY amount DESC) AS rn
  FROM orders o WHERE status='paid'
)
SELECT user_id,
       SUM(CASE WHEN rn<=2 THEN amount ELSE 0 END) AS top2_sum
FROM ranked
GROUP BY user_id
ORDER BY top2_sum DESC;

练习 3: 过去 7 天滚动 GMV 警戒:当日 GMV 与 7 日滚动中位数相比偏离超过 3×IQR 的日期。

-- MySQL 无原生 IQR 窗口函数,可用近似法:P75-P25 通过子查询取分位。-- 这里给出一个思路:先做日汇总,再在应用层计算 IQR 更稳妥。WITH daily AS (SELECT DATE(created_at) AS d, SUM(amount) AS gmv
  FROM orders WHERE status='paid'
  GROUP BY d
)
SELECT * FROM daily ORDER BY d;
-- 导出后在分析层(Python/R)用 IQR 判异。

练习 4: 为查询 WHERE user_id=? AND created_at>=? ORDER BY created_at 设计最优索引并用 EXPLAIN ANALYZE 验证。

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id=1 AND created_at>=DATE_SUB(NOW(), INTERVAL 90 DAY)
ORDER BY created_at;

练习 5: 实现 Keyset 分页 API 的 SQL 片段(上一页最后一条 order_id 作为游标)。

-- first page
SELECT order_id, user_id, amount
FROM orders WHERE status='paid'
ORDER BY order_id LIMIT 20;
-- next page
SELECT order_id, user_id, amount
FROM orders WHERE status='paid' AND order_id > :last_id
ORDER BY order_id LIMIT 20;

小结

  • 复杂查询先“拆步走”(CTE),能读就能调。
  • 以窗口函数替代相关子查询,实现分组排名、滚动与累计。
  • 事务要短、索引要准、锁要小;EXPLAIN/ANALYZE 常用常新。
  • 分页优先 Keyset,覆盖索引 + 回表降低 IO。

正文完
 0
一诺
版权声明:本站原创文章,由 一诺 于2025-10-02发表,共计7474字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)
验证码