MySQL 入门教程 · 第六课

30次阅读
没有评论

共计 8142 个字符,预计需要花费 21 分钟才能阅读完成。

目录

  1. 递归 CTE:层级 / 树型数据(类目 / 组织)
  2. 层级汇总:含子类的 GMV 统计(祖先映射法)
  3. ROLLUP/ 分组小计:多维报表一条 SQL
  4. 动态透视(Pivot):条件聚合与动态 SQL
  5. 留存与分群报表:窗口函数 Cohort 矩阵
  6. 时间区间连接:有效期价格 / 会员等级匹配
  7. 优化器 Trace:optimizer_trace 的开启与阅读
  8. 物化视图(模拟):增量汇总与幂等回填
  9. 数据质量 SQL:孤儿行 / 重复键 / 范围异常检查
  10. 实战练习(含参考答案)

版本:MySQL 8.0。延续 shop 库。示例可直接复用第一课~第五课数据结构。


1) 递归 CTE:层级 / 树型数据(类目 / 组织)

表结构(商品类目,父子关系):

CREATE TABLE IF NOT EXISTS categories (
  cat_id   BIGINT PRIMARY KEY,
  name     VARCHAR(100) NOT NULL,
  parent_id BIGINT NULL,
  KEY idx_parent(parent_id),
  CONSTRAINT fk_cat_parent FOREIGN KEY (parent_id) REFERENCES categories(cat_id)
);

插入示例

INSERT INTO categories(cat_id,name,parent_id) VALUES
 (1,'电子',NULL), (2,'外设',1), (3,'键盘',2), (4,'鼠标',2), (5,'显示器',1);

递归 CTE 展开整棵树(含根到叶的路径、深度):

WITH RECURSIVE cat_tree AS (
  -- 根层
  SELECT cat_id, parent_id, name,
         CAST(cat_id AS CHAR(200)) AS path,
         1 AS depth
  FROM categories WHERE parent_id IS NULL
  UNION ALL
  -- 向下展开
  SELECT c.cat_id, c.parent_id, c.name,
         CONCAT(ct.path, ',', c.cat_id) AS path,
         ct.depth + 1 AS depth
  FROM categories c
  JOIN cat_tree ct ON c.parent_id = ct.cat_id
  -- 简易环路保护(深度阈值 + 非严格检测)WHERE ct.depth < 32 AND FIND_IN_SET(c.cat_id, ct.path) = 0
)
SELECT * FROM cat_tree ORDER BY path;

要点:FIND_IN_SET 结合 path 字符串可防简单环;深度阈值避免意外自循环。


2) 层级汇总:含子类的 GMV 统计(祖先映射法)

需求:统计每个类目 包含其所有子类 的 GMV。

思路:先构造“每个节点的所有祖先”映射表,再把明细按类目 → 祖先映射累加。

-- 2.1 构造 ancestor 映射(child → ancestor)WITH RECURSIVE anc AS (
  SELECT cat_id AS child, cat_id AS ancestor FROM categories
  UNION ALL
  SELECT c.cat_id AS child, a.ancestor
  FROM categories c
  JOIN anc a ON c.parent_id = a.child
)
SELECT * FROM anc;  -- child, ancestor

-- 2.2 订单明细加入类目(假设 products(cat_id) 已存在)--     先把订单 GMV 摘出来
WITH item_gmv AS (SELECT oi.order_id, p.cat_id, SUM(oi.quantity * oi.unit_price) AS gmv
  FROM order_items oi JOIN products p ON oi.product_id = p.product_id
  GROUP BY oi.order_id, p.cat_id
),
anc AS (
  SELECT cat_id AS child, cat_id AS ancestor FROM categories
  UNION ALL
  SELECT c.cat_id AS child, a.ancestor
  FROM categories c JOIN anc a ON c.parent_id = a.child
)
SELECT a.ancestor AS cat_id, SUM(g.gmv) AS gmv_with_children
FROM item_gmv g
JOIN anc a ON g.cat_id = a.child
GROUP BY a.ancestor
ORDER BY gmv_with_children DESC;

祖先映射法允许你一次性计算“含子类”的指标;需要时再按时间分组即可。


3) ROLLUP/ 分组小计:多维报表一条 SQL

按渠道 × 月份汇总,并输出小计与总计

WITH paid AS (
  SELECT channel,
         DATE_FORMAT(created_at,'%Y-%m') AS ym,
         amount
  FROM orders WHERE status='paid'
)
SELECT channel, ym,
       SUM(amount) AS gmv
FROM paid
GROUP BY channel, ym WITH ROLLUP
ORDER BY channel IS NULL, channel, ym IS NULL, ym;

WITH ROLLUP 生成分组小计与总计。ORDER BY ... IS NULL 让总计行排在末尾。必要时用 IFNULL(channel,'ALL') 等友好显示。


4) 动态透视(Pivot):条件聚合与动态 SQL

静态列(已知几个月)

SELECT channel,
  SUM(CASE WHEN DATE_FORMAT(created_at,'%Y-%m')='2025-08' THEN amount ELSE 0 END) AS m_2025_08,
  SUM(CASE WHEN DATE_FORMAT(created_at,'%Y-%m')='2025-09' THEN amount ELSE 0 END) AS m_2025_09,
  SUM(CASE WHEN DATE_FORMAT(created_at,'%Y-%m')='2025-10' THEN amount ELSE 0 END) AS m_2025_10
FROM orders WHERE status='paid'
GROUP BY channel;

动态列(月份未知):用预处理构造 SQL。

SET @sql = (
  SELECT GROUP_CONCAT(
    DISTINCT CONCAT('SUM(CASE WHEN DATE_FORMAT(created_at,\'%Y-%m\')=\'', ym, '\' THEN amount ELSE 0 END) AS `', ym,'`') ORDER BY ym SEPARATOR',\n'
  )
  FROM (SELECT DISTINCT DATE_FORMAT(created_at,'%Y-%m') AS ym
    FROM orders WHERE status='paid'
  ) t
);
SET @sql = CONCAT('SELECT channel,', @sql, 'FROM orders WHERE status=\'paid\'GROUP BY channel');
PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;

5) 留存与分群报表:窗口函数 Cohort 矩阵

目标:按“用户首单月份”分 cohort,计算后续每月是否复购。

-- 5.1 计算用户首单月
WITH first_month AS (
  SELECT user_id,
         DATE_FORMAT(MIN(created_at),'%Y-%m') AS cohort
  FROM orders WHERE status='paid'
  GROUP BY user_id
),
user_month AS (
  SELECT o.user_id,
         DATE_FORMAT(o.created_at,'%Y-%m') AS ym
  FROM orders o WHERE o.status='paid'
)
SELECT fm.cohort, um.ym,
       COUNT(DISTINCT um.user_id) AS buyers,
       TIMESTAMPDIFF(MONTH, STR_TO_DATE(CONCAT(fm.cohort,'-01'),'%Y-%m-%d'),
                           STR_TO_DATE(CONCAT(um.ym,'-01'),'%Y-%m-%d')) AS month_offset
FROM first_month fm JOIN user_month um USING(user_id)
GROUP BY fm.cohort, um.ym
ORDER BY fm.cohort, um.ym;

转矩阵(列为月偏移)

SELECT cohort,
  SUM(CASE WHEN month_offset=0 THEN buyers ELSE 0 END) AS m0,
  SUM(CASE WHEN month_offset=1 THEN buyers ELSE 0 END) AS m1,
  SUM(CASE WHEN month_offset=2 THEN buyers ELSE 0 END) AS m2,
  SUM(CASE WHEN month_offset=3 THEN buyers ELSE 0 END) AS m3
FROM (
  -- 把上一查询作为子表
  SELECT fm.cohort, um.ym,
         COUNT(DISTINCT um.user_id) AS buyers,
         TIMESTAMPDIFF(MONTH, STR_TO_DATE(CONCAT(fm.cohort,'-01'),'%Y-%m-%d'),
                             STR_TO_DATE(CONCAT(um.ym,'-01'),'%Y-%m-%d')) AS month_offset
  FROM (SELECT user_id, DATE_FORMAT(MIN(created_at),'%Y-%m') AS cohort
    FROM orders WHERE status='paid' GROUP BY user_id
  ) fm
  JOIN (SELECT user_id, DATE_FORMAT(created_at,'%Y-%m') AS ym
    FROM orders WHERE status='paid'
  ) um USING(user_id)
  GROUP BY fm.cohort, um.ym
) t
GROUP BY cohort
ORDER BY cohort;

需要转百分比可用 m1/m0 等;更灵活的“动态列”可用 §4 的动态 SQL 技巧。


6) 时间区间连接:有效期价格 / 会员等级匹配

场景:给订单匹配下单时有效的价格或会员等级。

-- 价格表(有效区间)CREATE TABLE IF NOT EXISTS price_history (
  product_id BIGINT,
  start_ts   DATETIME NOT NULL,
  end_ts     DATETIME NOT NULL,
  price      DECIMAL(10,2) NOT NULL,
  PRIMARY KEY(product_id, start_ts),
  KEY idx_end(end_ts)
);

-- 匹配订单行对应的有效价格(区间连接)SELECT oi.order_id, oi.product_id, oi.quantity,
       ph.price AS price_effective
FROM order_items oi
JOIN orders o ON oi.order_id=o.order_id
JOIN price_history ph
  ON ph.product_id=oi.product_id
 AND o.created_at >= ph.start_ts
 AND o.created_at <  ph.end_ts;

优化建议:对 (product_id, start_ts) 建复合索引并保证 end_ts 有索引,查询加 product_id 精确条件可显著减少扫描。


7) 优化器 Trace:optimizer_trace 的开启与阅读

-- 仅在会话级开启
SET optimizer_trace='enabled=on', end_markers_in_json=on;
-- 运行你的查询
SELECT /* test */ * FROM orders WHERE user_id=1 AND created_at>=NOW()-INTERVAL 30 DAY;
-- 查看 JSON 轨迹(包含访问路径评估、代价计算、选择性估计等)SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE\G
-- 关闭
SET optimizer_trace='enabled=off';

使用场景:解释为什么没走某索引、为什么选择 Nested Loop/ 临时表等。注意 trace 较重,只在排查时短期开启。


8) 物化视图(模拟):增量汇总与幂等回填

目标:维护 metrics_daily(d, gmv)

CREATE TABLE IF NOT EXISTS metrics_daily (
  d    DATE PRIMARY KEY,
  gmv  DECIMAL(18,2) NOT NULL DEFAULT 0
);

-- 增量 upsert(每天跑一次;或事件调度器见第四课)INSERT INTO metrics_daily(d, gmv)
SELECT DATE(created_at) AS d, SUM(amount) AS gmv
FROM orders WHERE status='paid' AND created_at >= CURDATE()-INTERVAL 7 DAY
GROUP BY d
ON DUPLICATE KEY UPDATE gmv=VALUES(gmv);

幂等回填(历史整段)

-- 先删除目标区间再重算,避免累加污染
DELETE FROM metrics_daily WHERE d BETWEEN '2025-07-01' AND '2025-10-11';
INSERT INTO metrics_daily(d, gmv)
SELECT DATE(created_at) AS d, SUM(amount)
FROM orders WHERE status='paid'
  AND created_at >= '2025-07-01' AND created_at < '2025-10-12'
GROUP BY d
ON DUPLICATE KEY UPDATE gmv=VALUES(gmv);

原则:先删后插 + ON DUPLICATE KEY UPDATE,保证多次执行结果一致。


9) 数据质量 SQL:孤儿行 / 重复键 / 范围异常检查

-- 9.1 孤儿订单(无对应用户)SELECT o.order_id FROM orders o
LEFT JOIN users u ON o.user_id=u.user_id
WHERE u.user_id IS NULL;

-- 9.2 重复邮箱(不唯一)SELECT email, COUNT(*) c FROM users
GROUP BY email HAVING c>1;

-- 9.3 金额异常(负值 / 极端值)SELECT * FROM orders WHERE amount<0
UNION ALL
SELECT * FROM (SELECT *, NTILE(100) OVER(ORDER BY amount) AS pct
  FROM orders WHERE status='paid'
) t WHERE pct>=100; -- 近似 P100 长尾检查

10) 实战练习(含参考答案)

练习 1(层级汇总):统计每个“一级类目”包含所有子类的 GMV,并按 GMV 降序。

WITH RECURSIVE anc AS (
  SELECT cat_id AS child, cat_id AS ancestor FROM categories
  UNION ALL
  SELECT c.cat_id AS child, a.ancestor
  FROM categories c JOIN anc a ON c.parent_id = a.child
),
item_gmv AS (SELECT oi.order_id, p.cat_id, SUM(oi.quantity*oi.unit_price) AS gmv
  FROM order_items oi JOIN products p ON oi.product_id=p.product_id
  GROUP BY oi.order_id, p.cat_id
)
SELECT a.ancestor AS cat_id, SUM(g.gmv) AS gmv_all
FROM item_gmv g JOIN anc a ON g.cat_id=a.child
GROUP BY a.ancestor
ORDER BY gmv_all DESC;

练习 2(ROLLUP 报表):输出 渠道×月份 GMV,小计与总计。

SELECT channel, DATE_FORMAT(created_at,'%Y-%m') AS ym, SUM(amount) AS gmv
FROM orders WHERE status='paid'
GROUP BY channel, ym WITH ROLLUP
ORDER BY channel IS NULL, channel, ym IS NULL, ym;

练习 3(区间连接):将 price_historyorders 连接,求每笔订单的“下单时价格”。

SELECT o.order_id, oi.product_id, ph.price
FROM orders o JOIN order_items oi ON o.order_id=oi.order_id
JOIN price_history ph
  ON ph.product_id=oi.product_id
 AND o.created_at>=ph.start_ts AND o.created_at<ph.end_ts;

练习 4(Cohort 留存):输出 cohort(首单月)在 m0~m3 的复购人数矩阵。

WITH fm AS (SELECT user_id, DATE_FORMAT(MIN(created_at),'%Y-%m') AS cohort
  FROM orders WHERE status='paid' GROUP BY user_id
), um AS (SELECT user_id, DATE_FORMAT(created_at,'%Y-%m') AS ym
  FROM orders WHERE status='paid'
), base AS (
  SELECT fm.cohort, um.ym,
         TIMESTAMPDIFF(MONTH, STR_TO_DATE(CONCAT(fm.cohort,'-01'),'%Y-%m-%d'),
                             STR_TO_DATE(CONCAT(um.ym,'-01'),'%Y-%m-%d')) AS mo,
         COUNT(DISTINCT um.user_id) AS buyers
  FROM fm JOIN um USING(user_id)
  GROUP BY fm.cohort, um.ym
)
SELECT cohort,
  SUM(CASE WHEN mo=0 THEN buyers ELSE 0 END) AS m0,
  SUM(CASE WHEN mo=1 THEN buyers ELSE 0 END) AS m1,
  SUM(CASE WHEN mo=2 THEN buyers ELSE 0 END) AS m2,
  SUM(CASE WHEN mo=3 THEN buyers ELSE 0 END) AS m3
FROM base GROUP BY cohort ORDER BY cohort;

练习 5(optimizer_trace):开启 trace,解释为何以下查询没有用到某个索引,并给出可索引化改写。

SET optimizer_trace='enabled=on';
SELECT * FROM orders WHERE DATE(created_at)='2025-10-01'; -- 可能未走索引
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE\G
SET optimizer_trace='enabled=off';
-- 改写:SELECT * FROM orders WHERE created_at>='2025-10-01' AND created_at<'2025-10-02';

小结

  • 递归 CTE + 祖先映射解决“含子类”的统计与树遍历。
  • WITH ROLLUP 直出小计 / 总计;动态透视靠条件聚合或动态 SQL。
  • Cohort 报表 = 首单分组 + 月偏移;区间连接需好索引与谓词顺序。
  • optimizer_trace 帮你理解计划选择;物化汇总要 幂等 增量

正文完
 0
一诺
版权声明:本站原创文章,由 一诺 于2025-10-06发表,共计8142字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
评论(没有评论)
验证码