Mysql limit 查询相关优化

前言

我们知道,Mysql分页查询中limit查询,比如如下语句:

1
select * from my_table where index_col = xxx limit offset,limit; 

是会读取offset前面的数据的。

其表现为根据index_col = xxx条件定位到offset,再向后取limit行。

当数据量变大时,执行效率会下降。

什么意思呢?

我们通过一个例子来看下,并了解相关的优化方法。

正文

准备工作

我们准备一张student表(随便一张测试表就行),其结构如下:

1
2
3
4
5
6
7
8
9
10
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`stu_no` varchar(20) NOT NULL COMMENT '学号',
`stu_sex` char(1) DEFAULT NULL COMMENT '学生性别',
`stu_birthday` date DEFAULT NULL COMMENT '学生生日',
`stu_class` char(2) DEFAULT NULL COMMENT '学生班级',
`stu_name` varchar(50) DEFAULT NULL COMMENT '学生姓名',
PRIMARY KEY (`id`),
KEY `idx_stu_no` (`stu_no`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

并向其中灌入200w数据(越多越好)。

部分相关代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
@RunWith(SpringRunner.class)
@SpringBootTest
public class LimitTest {

@Autowired
StudentMapper studentMapper;

@Test
public void makeData(){
int num = 2000000;
int step = 100;
int n = num/step;
for (int k=0;k<n;k++){
List<StudentModel> list = new ArrayList<>();
for (int i=0;i<step;i++) {
StudentModel studentModel = new StudentModel();
studentModel.setStuNo(k*step+i+"");
studentModel.setStuSex(new Random().nextBoolean() ? "1" : "0");
studentModel.setStuClass(new Random().nextInt(4)+1+"");
studentModel.setStuBirthday(new Date());
studentModel.setStuName(getChineseName());
list.add(studentModel);
}
studentMapper.insertBatch(list);
}
}


public static String getChineseName() {
String str = null;
String name = null;
int highPos, lowPos;
Random random = new Random();
//区码,0xA0打头,从第16区开始,即0xB0=11*16=176,16~55一级汉字,56~87二级汉字
highPos = (176 + Math.abs(random.nextInt(72)));
random=new Random();
//位码,0xA0打头,范围第1~94列
lowPos = 161 + Math.abs(random.nextInt(94));

byte[] bArr = new byte[2];
bArr[0] = (new Integer(highPos)).byteValue();
bArr[1] = (new Integer(lowPos)).byteValue();
try {
//区位码组合成汉字
str = new String(bArr, "GB2312");
int index=random.nextInt(Surname.length-1);
//获得一个随机的姓氏
name = Surname[index] +str;

} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return name;
}

//百家姓
private static final String[] Surname= {"赵","钱","孙","李","周","吴","郑","王","冯","陈","褚","卫","蒋","沈","韩","杨","朱","秦","尤","许","何","吕","施","张","孔","曹","严","华","金","魏","陶","姜","戚","谢","邹","喻","柏","水","窦","章","云","苏","潘","葛","奚","范","彭","郎","鲁","韦","昌","马","苗","凤","花","方","俞","任","袁","柳","酆","鲍","史","唐","费","廉","岑","薛","雷","贺","倪","汤","滕","殷","罗","毕","郝","邬","安","常","乐","于","时","傅","皮","卞","齐","康","伍","余","元","卜","顾","孟","平","黄","和","穆","萧","尹","姚","邵","湛","汪","祁","毛","禹","狄","米","贝","明","臧","计","伏","成","戴","谈","宋","茅","庞","熊","纪","舒","屈","项","祝","董","梁","杜","阮","蓝","闵","席","季","麻","强","贾","路","娄","危","江","童","颜","郭","梅","盛","林","刁","钟","徐","邱","骆","高","夏","蔡","田","樊","胡","凌","霍","虞","万","支","柯","昝","管","卢","莫","经","房","裘","缪","干","解","应","宗","丁","宣","贲","邓","郁","单","杭","洪","包","诸","左","石","崔","吉","钮","龚","程","嵇","邢","滑","裴","陆","荣","翁","荀","羊","于","惠","甄","曲","家","封","芮","羿","储","靳","汲","邴","糜","松","井","段","富","巫","乌","焦","巴","弓","牧","隗","山","谷","车","侯","宓","蓬","全","郗","班","仰","秋","仲","伊","宫","宁","仇","栾","暴","甘","钭","厉","戎","祖","武","符","刘","景","詹","束","龙","叶","幸","司","韶","郜","黎","蓟","溥","印","宿","白","怀","蒲","邰","从","鄂","索","咸","籍","赖","卓","蔺","屠","蒙","池","乔","阴","郁","胥","能","苍","双","闻","莘","党","翟","谭","贡","劳","逄","姬","申","扶","堵","冉","宰","郦","雍","却",
"璩","桑","桂","濮","牛","寿","通","边","扈","燕","冀","浦","尚","农","温","别","庄","晏","柴","瞿","阎","充","慕","连","茹","习","宦","艾","鱼","容","向","古","易","慎","戈","廖","庾","终","暨","居","衡","步","都","耿","满","弘","匡","国","文","寇","广","禄","阙","东","欧","殳","沃","利","蔚","越","夔","隆","师","巩","厍","聂","晁","勾","敖","融","冷","訾","辛","阚","那","简","饶","空","曾","毋","沙","乜","养","鞠","须","丰","巢","关","蒯","相","查","后","荆","红","游","郏","竺","权","逯","盖","益","桓","公","仉","督","岳","帅","缑","亢","况","郈","有","琴","归","海","晋","楚","闫","法","汝","鄢","涂","钦","商","牟","佘","佴","伯","赏","墨","哈","谯","篁","年","爱","阳","佟","言","福","南","火","铁","迟","漆","官","冼","真","展","繁","檀","祭","密","敬","揭","舜","楼","疏","冒","浑","挚","胶","随","高","皋","原","种","练","弥","仓","眭","蹇","覃","阿","门","恽","来","綦","召","仪","风","介","巨","木","京","狐","郇","虎","枚","抗","达","杞","苌","折","麦","庆","过","竹","端","鲜","皇","亓","老","是","秘","畅","邝","还","宾","闾","辜","纵","侴","万俟","司马","上官","欧阳","夏侯","诸葛","闻人","东方","赫连","皇甫","羊舌","尉迟","公羊","澹台","公冶","宗正","濮阳","淳于","单于","太叔","申屠","公孙","仲孙","轩辕","令狐","钟离","宇文","长孙","慕容","鲜于","闾丘","司徒","司空","兀官","司寇","南门","呼延","子车","颛孙","端木","巫马","公西","漆雕","车正","壤驷","公良","拓跋","夹谷","宰父","谷梁","段干","百里","东郭","微生","梁丘","左丘","东门","西门","南宫","第五","公仪","公乘","太史","仲长","叔孙","屈突","尔朱","东乡","相里","胡母","司城","张廖","雍门","毋丘","贺兰","綦毋","屋庐","独孤","南郭","北宫","王孙"};
}

以上便完成了我们的前期数据准备工作。

测试

我们开始使用分页语句limit进行多次测试。

1
select * from student ORDER BY stu_no LIMIT 10,10;

我们可以看到查询还是比较快的。

我们增大offset进行测试,这次我们来查询1000000条后的10条。

1
select * from student ORDER BY stu_no LIMIT 1000000,10;

可以看到耗时已经达到了 3.6s。

当我们继续增大offset时,花费的时间会更多。

分析

在上面建表时,我们stu_no是有索引的,我们用 explain分析一下如下两条语句。

1
EXPLAIN select * from student ORDER BY stu_no LIMIT 10,10;

可以看到当offset较小时是使用到了stu_no索引。

1
EXPLAIN select * from student ORDER BY stu_no LIMIT 1000000,10;

可以看到,当offset较大时,该语句就会进行全表扫描,这就是效率低的原因。

为什么会这样呢?

因为我们给stu_no创建了索引,为B+Tree,普通索引其子节点上存储的是主键值,我们按照stu_no排序时,其逻辑为先根据stu_no找到主键id,再根据主键索引找到记录(回表),数据量较大时,Mysql由于需要拉出offset前的所有数据,因此查询优化器分析后,认为还不如走全表扫描,因此就会走全表扫描。

这也就是上面我们看到的结果。

优化

定位到上面问题后,我们该如何优化呢?

我们先来看下如果强制使用stu_no进行查询,结果如何?

1
select * from student force INDEX(idx_stu_no) ORDER BY stu_no  LIMIT 1000000,10;

可以看到耗时5s,比全表扫描要慢,看来Mysql的查询优化器分析的是没错的。

我们有什么优化手段呢?

我们想要的是1000000条后的10条数据,也就是前面的数据是没用的,但是Mysql查询的时候是要分析前1000000条数据的。

因此我们可以只保留最简单的信息,即主键,如下:

1
select id from student ORDER BY stu_no LIMIT 1000000,10;

可以看到这条语句查询很快,也用到了索引(就相当于在stu_no的B+Tree索引中找数据,不用回表)。

这样相当于拿到了1000000条后的10条数据的主键,我们用它们作为临时表,关联一下即可拿到数据。

1
select * from student s,(select id from student ORDER BY stu_no LIMIT 1000000,10) temp where temp.id=s.id;

可以看到优化后的耗时大大缩短,我们使用explain可以看到分析结果。

总结

上述就是关于Mysql分页语句limit的优化总结。

其关键思路如下:

  • 根据条件(默认条件字段有索引)及 limit 拿到主键列表(会直接查询索引树,不用回表);
  • 根据拿到的主键列表关联主表查询。
1
select * from my_table s,(select id from my_table ORDER BY index_col LIMIT 1000000,10) temp where temp.id=s.id;

普通的limit语句慢的原因:

1
select * from my_table ORDER BY index_col LIMIT 1000000,10;
  • Mysql 需要查询前1000000行,而后再取后面10行数据,其实前1000000行我们遍历直接通过index_col建立的B+Tree来处理就行。



-------------文章结束啦 ~\(≧▽≦)/~ 感谢您的阅读-------------

您的支持就是我创作的动力!

欢迎关注我的其它发布渠道