EXPLAIN
与desc同义,该命令用于查看MySQL的执行计划,分析SQL语句的执行过程以及优化.
EXPLAIN SELECT 1;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
列名 | 描述 |
---|---|
id | 序号,在一个大的查询语句中每个SELECT关键字都对应一个唯一的id |
select_type | 查询的类型 |
table | 表名 |
partitions | 匹配的分区信息 |
type | 针对单表的访问方法 |
possible_keys | 可能用到的索引 |
key | 实际上使用的索引 |
key_len | 实际使用到的索引长度 |
ref | 当使用索引列等值查询时,与索引列进行等值匹配的对象信息 |
rows | 预估的需要读取的记录条数 |
filtered | 某个表经过搜索条件过滤后剩余记录条数的百分比 |
Extra | 一些额外的信息 |
id
select查询序号,id相同时执行顺序自上而下,id不同时从高到低执行.
不是说有多少个select就有多少个id,连接查询没有select但是也会扫描多个表,它们id相同,前面的是驱动表,后面的是被驱动表.
EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.m1=t2.m2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
如果是临时表这里会显示null.比如union最后会开个临时表对数据去重.
EXPLAIN SELECT * FROM t1 UNION SELECT * FROM t2;
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 2 | UNION | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
select_type
查询类型,用于指明该select在整个SQL中扮演什么角色.
名称 | 描述 |
---|---|
SIMPLE (简单查询) | Simple SELECT (not using UNION or subqueries) |
PRIMARY (主要查询) | Outermost SELECT |
UNION (联合查询) | Second or later SELECT statement in a UNION |
UNION RESULT (联合结果查询) | Result of a UNION |
SUBQUERY (子查询) | First SELECT in subquery |
DEPENDENT SUBQUERY (相关子查询) | First SELECT in subquery, dependent on outer query |
DEPENDENT UNION (相关联合查询) | Second or later SELECT statement in a UNION, dependent on outer query |
DERIVED (派生查询) | Derived table |
MATERIALIZED (物化表查询) | Materialized subquery |
UNCACHEABLE SUBQUERY (当前子查询) | A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query |
UNCACHEABLE UNION (当前联合查询) | The second or later select in a UNION that belongs to an uncacheable subquery (see UNCACHEABLE SUBQUERY) |
可以看到子查询的类型比较多,它也比较复杂,我暂时也还没看完所以这里就先不深入了. |
-
SIMPLE
SQL中不包含union和子查询的都算简单查询. 当然,join也算简单查询.
EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.m1=t2.m2;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+
-
PRIMARY
主查询,对于包含union/union all/子查询的查询(可以称为复杂查询)而言,它是由几个小查询组成,其中最左边的查询类型就是PRIMARY.
EXPLAIN SELECT * FROM t1 UNION SELECT * FROM t2;
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
| 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | NULL |
| 2 | UNION | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+-----------------+
-
UNION
union里面位于主查询之后的查询.
-
UNION RESULT
MySQL使用临时表来完成union查询结果的去重工作,针对该临时表的查询类型.
-
SUBQUERY
子查询,只会执行一次.
-
DEPENDT SUBQUERY
相关子查询,与普通的子查询不同,执行过程依赖于外层变量,因此对于外层查询的每一条结果都会执行一次,效率极低,是优化的重点.
-
DEPENDENT UNION
和union的区别在于执行过程依赖外层查询.
EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2 WHERE key1 = 'a' UNION SELECT key1 FROM s1 WHERE key1 = 'b');
+----+--------------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | s1 | NULL | ALL | NULL | NULL | NULL | NULL | 97124 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | s2 | NULL | ref | idx_key1 | idx_key1 | 303 | const | 1 | 100.00 | Using where; Using index |
| 3 | DEPENDENT UNION | s1 | NULL | ref | idx_key1 | idx_key1 | 303 | const | 1 | 100.00 | Using where; Using index |
| NULL | UNION RESULT | <union2,3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+--------------------------+
上面的例子中同时出现了DEPENDENT SUBQUERY和DEPENDENT UNION,但是它们都没有依赖外层变量,单独执行的时候都是简单查询,这个和定义的不一样.
EXPLAIN SELECT * FROM s1 WHERE key1 IN (SELECT key1 FROM s2 WHERE key1 = 'a' );
如果把SQL子查询的union拿掉,子查询的类型又会变回SIMPLE,这我暂时还没完全弄懂,先留个坑.
-
DERIVED
派生子查询.如果在from中出现子查询,MySQL会创建派生表,用于形成派生表的查询即派生子查询.
EXPLAIN SELECT * FROM (SELECT key1, count(*) as c FROM s1 GROUP BY key1) AS derived_s1 where c > 1;
+----+-------------+------------+------------+-------+---------------+----------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+-------+----------+-------------+
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 97124 | 100.00 | NULL |
| 2 | DERIVED | s1 | NULL | index | idx_key1 | idx_key1 | 303 | NULL | 97124 | 100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+----------+---------+------+-------+----------+-------------+
partitions
如果该表进行了分区,就会显示查询了哪个分区.分区功能似乎较少使用,现在数据库压力大的话主流方案都是分库分表.
type
执行查询时的访问方式,包括system ,const ,eq_ref ,ref ,fulltext ,ref_or_null ,index_merge ,unique_subquery ,index_subquery ,range ,index ,ALL. 效率一般由高到低.
-
system
该表只有一行,等于系统表,查询速度最快.
-
const
该表至多只有一行数据匹配,表示对唯一索引的等值查询.
-
eq_ref
与const类似,不过是出现在join查询里面,对于被驱动表的唯一索引的等值查询.
EXPLAIN SELECT * FROM s1 INNER JOIN s2 ON s1.id=s2.id;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------+
| 1 | SIMPLE | s1 | NULL | ALL | NULL | NULL | NULL | NULL | 97124 | 100.00 | NULL |
| 1 | SIMPLE | s2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | stuudy.s1.id | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+-------+----------+-------+
-
ref
该表可能有多行数据匹配,表示对非唯一索引的等值查询(不包括范围查询)
-
fulltext
全文索引/倒排索引
-
ref_or_null
在ref的基础上额外对null进行搜索,也就是该普通索引的索引列可以存null
-
index_merge
索引合并.在5.0以前MySQL对单表的扫描仅能使用一个索引,后来推出了索引合并,如果and和or两边可以分别使用索引,那就分别走索引,再对两边的结果求交集或并集.
EXPLAIN SELECT * FROM s1 WHERE common_field IN (SELECT key3 FROM s2 where s1.key1 = s2.key1) OR key3 = 'a';
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
| 1 | PRIMARY | s1 | NULL | ALL | idx_key3 | NULL | NULL | NULL | 97124 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | s2 | NULL | index_subquery | idx_key1,idx_key3 | idx_key3 | 303 | func | 1 | 10.00 | Using where |
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
索引合并有三种形式: intersection(交集,对应where A and B),union(并集,对应where A or B),还有sort-union.
对于sort-union官方文档是这样解释的:
The difference between the sort-union algorithm and the union algorithm
is that the sort-union algorithm must first fetch row IDs for all rows
and sort them before returning any rows.
排序联合算法和联合算法的区别在于,
排序联合算法必须首先获取所有行的行 ID,
并在返回任何行之前对它们进行排序。
SELECT * FROM tbl_name
WHERE key_col1 < 10 OR key_col2 < 20;
SELECT * FROM tbl_name
WHERE (key_col1 > 10 OR key_col2 = 20) AND nonkey_col = 30;
暂时没看懂具体情景是什么
-
unique_subquery
类似于eq_ref,不过是在相关子查询里面对主键索引进行搜索.形式如value IN (SELECT primary_key FROM single_table WHERE some_expr).虽然官网说unique_subquery是eq_ref在相关子查询里面的特例,但eq_ref是等值查询,而unique_subquery可以是范围查询.
EXPLAIN SELECT * FROM s1 WHERE key2 IN (SELECT id FROM s2 where s1.key1 = s2.key1) OR key3 = 'a';
+----+--------------------+-------+------------+-----------------+------------------+---------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+-----------------+------------------+---------+---------+------+-------+----------+-------------+
| 1 | PRIMARY | s1 | NULL | ALL | idx_key3 | NULL | NULL | NULL | 97124 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | s2 | NULL | unique_subquery | PRIMARY,idx_key1 | PRIMARY | 4 | func | 1 | 10.00 | Using where |
+----+--------------------+-------+------------+-----------------+------------------+---------+---------+------+-------+----------+-------------+
-
index_subquery
类似于unique_subquery,不过是在相关子查询里面对非主键索引进行搜索.
EXPLAIN SELECT * FROM s1 WHERE common_field IN (SELECT key3 FROM s2 where s1.key1 = s2.key1) OR key3 = 'a';
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
| 1 | PRIMARY | s1 | NULL | ALL | idx_key3 | NULL | NULL | NULL | 97124 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | s2 | NULL | index_subquery | idx_key1,idx_key3 | idx_key3 | 303 | func | 1 | 10.00 | Using where |
+----+--------------------+-------+------------+----------------+-------------------+----------+---------+------+-------+----------+-------------+
-
range
范围查询,情景包括<>,<,>,<=,>=,<=>,like,between,or,in()等操作符.此外官网还列出了=和is null,应该是写错了.
EXPLAIN SELECT * FROM s1 WHERE key1 IN ('a', 'b', 'c');
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | s1 | NULL | range | idx_key1 | idx_key1 | 303 | NULL | 3 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
或者
EXPLAIN SELECT * FROM s1 WHERE key1 > 'a' AND key1 < 'b';
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | s1 | NULL | range | idx_key1 | idx_key1 | 303 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
-
index
对非主键索引树的全扫描,字段被索引覆盖,但是无法确定查找范围只能全扫描.
EXPLAIN SELECT key_part2 FROM s1 WHERE key_part3 = 'a';
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+-------+----------+--------------------------+
| 1 | SIMPLE | s1 | NULL | index | idx_key_part | idx_key_part | 909 | NULL | 97124 | 10.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+-------+----------+--------------------------+
在上面的例子中,s1仅有idx_key_part这个索引,上面有key_part2和key_part3两列,select的字段被覆盖,但是where的字段无法应用索引进行等值或范围查询,只能全扫描,所以走主键索引或者idx_key_part索引区别不大,普通索引的规模比主键索引小,所以走了普通索引的全扫描.
-
all
对主键索引树的全扫描,也就是常常提到的全表扫描.性能最差.
type是分析SQL执行效率很重要的指标.
在SIMPLE类型的单表扫描里面,system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > range > index > all
range及以上都利用了索引,无需优化.在index和all的情况下访问大表会有点耗时,需要优化,调整SQL或者索引设置等.
子查询复杂一点,可能会执行多次,不能仅看访问方式,还要结合外层查询来分析具体会扫描多少行,所以一般建议少用子查询.
possible_key
执行中可能使用到的索引.
key
执行中实际用到的索引.
EXPLAIN SELECT * FROM s1 WHERE key1 > 'z' AND key3 = 'a';
+----+-------------+-------+------------+------+-------------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------+----------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | s1 | NULL | ref | idx_key1,idx_key3 | idx_key3 | 303 | const | 1 | 5.00 | Using where |
+----+-------------+-------+------------+------+-------------------+----------+---------+-------+------+----------+-------------+
脑补了这个SQL的执行过程,应该可以有三种方式:
- 走idx_key1索引,找出key1 > ‘z’的数据行,然后再筛选出key3 = ‘a’的结果.
- 走idx_key3索引,找出key3 = ‘a’的数据行,然后再筛选出key1 > ‘z’的结果.
- 使用索引合并,分别从idx_key1和idx_key3中找出key1 > ‘z’和key3 = ‘a’的结果,对两个结果集求交集.
走idx_key1索引是range,走idx_key3索引是ref,走索引合并是index_merge,所以选择了最高效的idx_key3索引.
也存在possible_key为空,但是key有值的情况.这一般发生在type=index的索引覆盖情况下,主键索引和普通索引都要全扫描,但是普通索引数据空间更小,所以MySQL选择遍历普通索引.
key_len
使用的索引的字段长度,如果是索引合并,那么就是这些索引的最大字段和.单位是Byte.仅参考表定义的字段类型最大长度,不是实际数据的字段长度.
ref
当使用索引列等值查询,也就是访问方式为const/eq_ref/ref/unique_subquery/index_subquery时,ref显示的就是用于等值查询的字段是什么.值为const表示匹配的是一个常量,值为func表示匹配的是一个函数.
rows
根据表统计信息和数据采样情况,MySQL预估执行查询需要扫描的行数.
这个值不一定精确,MySQL无法提前知道有多少行符合条件所以只能数据采样,根据几个数据页的情况推算出总的结果.
filtered
最终符合结果的行数占扫描行数的百分比,也可以理解为有效查询的百分比,值越大越好.
可以说SQL优化最终的目的就是降低rows中预估要扫描的行数,提高filtered中的百分比.
Extra
一些附加信息,像join查询可以分为Index Nested-Loop Join和Block Nested-Loop Join两种算法,具体采用哪一种就会在extra中写明.这里挑一些常见的进行说明.
-
Using filesort
需要排序(包括内存排序和外部文件排序),常见于对一个无法使用索引的字段(没建立索引或索引失效)进行order by.
-
Using index
索引覆盖,使用了普通索引,并且所需字段全部都能从该索引中获得,无须回表
-
Using index condition
索引下推,走了普通索引,但无法索引覆盖需要回表,此时对索引中包含的字段先做判断,直接过滤掉不满足条件的记录,减少回表次数.
-
Using where
走了普通索引,但是select或where中还有一些字段不存在于索引中,需要回表获取字段.
-
Using where;Using index
这个我不知道怎么解释,像前面说的,using where表示索引未覆盖,需要回表,using index表示索引覆盖,无需回表.它们是矛盾的,分开我能理解,合在一起我就很凌乱.官网也没说明.
EXPLAIN SELECT a FROM t1 where a > 'sad' ;
像这个简单的例子,在a上建了索引,如果是范围查询,那么extra就会显示Using where;Using index ,如果是等值查询,extra就会显示Using index.看了国内外的一些解释都无法说服我,感觉都在瞎说.
https://segmentfault.com/q/1010000004197413
-
Using index for group-by
使用了group by,并且group by过程走了索引.这个很少见,我没构造出相关例子,如果能走索引extra会显示Using where;Using index,如果不能就会显示Using temporary,Using filesort.
-
Using join buffer
使用了连接缓冲区,情景是join查询
-
Using temporary
使用了临时表,常见于order by和group by.
如何利用执行计划
对于简单的查询主要看key,type,rows,filtered这几列,如果rows很小(指生产环境)可以暂时不用优化.如果type为index或all,那么key里面的索引就没有覆盖到where子句里的全部字段,需要对索引进行调整.
对于join查询,被驱动表尽量走索引,如果无法走索引,join使用的就是Block Nested-Loop Join算法,在extra一列会看到Using join buffer(Block Nested Loop),驱动表和被驱动表都要全表扫描,而且当驱动表规模大于join buffer时,被驱动表会被全表扫描多次.
如果连接字段无法建立索引,或者不值得建立索引(大表的低频语句),有两种方案可以优化:
- 在应用中把要连接的两张表读取到内存,驱动表放到hashmap中,被驱动表通过hash去匹配连接.这个也是hash join的思路,MySQL8.0开始支持.
- 把被驱动表中符合条件的列放到临时表,在临时表中对连接字段建立索引,然后连接临时表.
create temporary table temp_t(id int primary key, a int, b int, index(b))engine=innodb;
insert into temp_t select * from t2 where b>=1 and b<=2000;
select * from t1 join temp_t on (t1.b=temp_t.b);
对于group by和order by,对应的字段尽量建立索引.
在extra中出现join buffer,sort buffer,temporary这些的都表明要使用额外内存去辅助计算,额外内存并不是毫无限制,像temporary的默认限制大小就是1G,所以这时就要关注数据的规模会不会过大,如果超出限制,join和sort会使用硬盘来完成计算,temporary就会报错.