postgresql 中的 like 查询更新方案

当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比。
 
一、对比情况说明:
 
1、数据量100w条数据
 
2、执行sql
 
二、对比结果
 
explain analyze SELECT
 c_patent,
 c_applyissno,
 d_applyissdate,
 d_applydate,
 c_patenttype_dimn,
 c_newlawstatus,
 c_abstract
FROM
 public.t_knowl_patent_zlxx_temp
WHERE
 c_applicant LIKE '%本溪满族自治县连山关镇安平安养殖场%';
 
 
 
1、未建索时执行计划:
 
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3)
  Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
  Rows Removed by Filter: 333333
Planning time: 0.272 ms
Execution time: 228.116 ms"
 
2、btree索引
 
建索引语句
 
1CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
 
执行计划
 
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3)
  Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
  Rows Removed by Filter: 333333
Planning time: 0.116 ms
Execution time: 218.189 ms"
 
但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的
 
Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1)
 Index Cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场'::text) AND ((c_applicant)::text ~<~ '本溪满族自治县连山关镇安平安养殖圻'::text))
 Filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%'::text)
Planning time: 0.710 ms
Execution time: 0.378 ms
 
3、gin索引
 
创建索引语句(postgresql要求在9.6版本及以上)
 
create extension pg_trgm;CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);
 
执行计划
 
Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1)
 Recheck Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
 -> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1)
  Index Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)
Planning time: 0.673 ms
Execution time: 0.740 ms
 
三、结论
 
btree索引可以让后置% "abc%"的模糊匹配走索引,gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端,以下情况可能导致无法命中:
 
搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。
 
当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。
 
补充:PostgreSQL LIKE 查询效率提升实验
 
未做索引的查询效率
 
作为对比,先对未索引的查询做测试
 
EXPLAIN ANALYZE select * from gallery_map where author = '曹志耘';
             QUERY PLAN            
—————————————————————————————————————–
 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1)
 Filter: ((author)::text = '曹志耘'::text)
 Rows Removed by Filter: 71315
 Planning time: 0.194 ms
 Execution time: 39.879 ms
(5 rows)
 
Time: 40.599 ms
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘';
             QUERY PLAN            
—————————————————————————————————————–
 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1)
 Filter: ((author)::text ~~ '曹志耘'::text)
 Rows Removed by Filter: 71315
 Planning time: 0.188 ms
 Execution time: 41.669 ms
(5 rows)
 
Time: 42.457 ms
 
EXPLAIN ANALYZE select * from gallery_map where author like '曹志耘%';
             QUERY PLAN            
—————————————————————————————————————–
 Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1)
 Filter: ((author)::text ~~ '曹志耘%'::text)
 Rows Removed by Filter: 71315
 Planning time: 0.307 ms
 Execution time: 41.633 ms
(5 rows)
 
Time: 42.676 ms
 
 
 
很显然都会做全表扫描
【声明】:芜湖站长网内容转载自互联网,其相关言论仅代表作者个人观点绝非权威,不代表本站立场。如您发现内容存在版权问题,请提交相关链接至邮箱:bqsm@foxmail.com,我们将及时予以处理。

相关文章