postgresql 索引之 hash的使用方法

这篇文章主要介绍了postgresql 索引之 hash的使用详解，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧。

os: ubuntu 16.04

postgresql: 9.6.8

ip 规划

192.168.56.102 node2 postgresql

help create index

postgres=# \h create index

Command: CREATE INDEX

Description: define a new index

Syntax:

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]

( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, …] )

[ WITH ( storage_parameter = value [, … ] ) ]

[ TABLESPACE tablespace_name ]

[ WHERE predicate ]

[ USING method ]

method

要使用的索引方法的名称。可以选择 btree、hash、 gist、spgist、 gin以及brin。默认方法是btree。

hash

hash 只能处理简单的等值比较，

postgres=# drop table tmp_t0;

DROP TABLE

postgres=# create table tmp_t0(c0 varchar(100),c1 varchar(100));

CREATE TABLE

postgres=# insert into tmp_t0(c0,c1) select md5(id::varchar),md5((id+id)::varchar) from generate_series(1,100000) as id;

INSERT 0 100000

postgres=# create index idx_tmp_t0_1 on tmp_t0 using hash(c0);

CREATE INDEX

postgres=# \d+ tmp_t0

Table "public.tmp_t0"

——–+————————+———–+———-+———+———-+————–+————-

Indexes:

"idx_tmp_t0_1" hash (c0)

postgres=# explain select * from tmp_t0 where c0 = 'd3d9446802a44259755d38e6d163e820';

QUERY PLAN

—————————————————————————-

Index Scan using idx_tmp_t0_1 on tmp_t0 (cost=0.00..8.02 rows=1 width=66)

Index Cond: ((c0)::text = 'd3d9446802a44259755d38e6d163e820'::text)

(2 rows)

注意事项，官网特别强调：

Hash索引操作目前不被WAL记录，因此存在未写入修改，在数据库崩溃后需要用REINDEX命令重建Hash索引。

同样，在完成初始的基础备份后，对于Hash索引的改变也不会通过流式或基于文件的复制所复制，所以它们会对其后使用它们的查询给出错误的答案。

正因为这些原因，Hash索引已不再被建议使用。

补充：Postgresql hash索引介绍

hash索引的结构

当数据插入索引时，我们会为这个索引键通过哈希函数计算一个值。 PostgreSQL中的哈希函数始终返回“整数”类型，范围为2^32≈40亿。bucket桶的数量最初为2个，然后动态增加以适应数据大小。可以使用位算法从哈希码计算出桶编号。这个bucket将存放TID。

由于可以将与不同索引键匹配的TID放入同一bucket桶中。而且除了TID之外，还可以将键的源值存储在bucket桶中，但这会增加索引大小。为了节省空间，bucket桶只存储索引键的哈希码，而不存储索引键。

当我们通过索引查询时，我们计算索引键的哈希函数并获取bucket桶的编号。现在，仍然需要遍历存储桶的内容，并仅返回所需的哈希码匹配的TID。由于存储的“hash code – TID”对是有序的，因此可以高效地完成此操作。

但是，两个不同的索引键可能会发生以下情况，两个索引键都进入一个bucket桶，而且具有相同的四字节的哈希码。因此，索引访问方法要求索引引擎重新检查表行中的情况来验证每个TID。

映射数据结构到page

Meta page – 0号page,包含索引内部相关信息

Bucket pages – 索引的主要page,存储 “hash code – TID” 对

Overflow pages – 与bucket page的结构相同，在不足一个page时,作为bucket桶使用

Bitmap pages – 跟踪当前干净的overflow page，并可将其重新用于其他bucket桶

注意，哈希索引不能减小大小。虽然我们删除了一些索引行，但是分配的页面将不会返回到操作系统，只会在VACUUMING之后重新用于新数据。减小索引大小的唯一选项是使用REINDEX或VACUUM FULL命令从头开始重建索引

接下来看下hash索引如何创建

demo=# create index on flights using hash(flight_no);

demo=# explain (costs off) select * from flights where flight_no = 'PG0001';

QUERY PLAN

—————————————————-

Bitmap Heap Scan on flights

Recheck Cond: (flight_no = 'PG0001'::bpchar)

-> Bitmap Index Scan on flights_flight_no_idx

Index Cond: (flight_no = 'PG0001'::bpchar)

(4 rows)

注意：10版本之前hash索引不记录到wal中，所以hash索引不能做recovery，当然也就不能复制了，但是从10版本以后hash所用得到了增强，可以记录到wal中，创建的时候也不会再有警告。

查看hash访问方法相关的操作函数

demo=# select opf.opfname as opfamily_name,

amproc.amproc::regproc AS opfamily_procedure

from pg_am am,

pg_opfamily opf,

pg_amproc amproc

where opf.opfmethod = am.oid

and amproc.amprocfamily = opf.oid

and am.amname = 'hash'

order by opfamily_name,

opfamily_procedure;

opfamily_name | opfamily_procedure

——————–+————————-

abstime_ops | hashint4extended

abstime_ops | hashint4

aclitem_ops | hash_aclitem

aclitem_ops | hash_aclitem_extended

array_ops | hash_array

array_ops | hash_array_extended

bool_ops | hashcharextended

bool_ops | hashchar

bpchar_ops | hashbpcharextended

bpchar_ops | hashbpchar

bpchar_pattern_ops | hashbpcharextended

bpchar_pattern_ops | hashbpchar

bytea_ops | hashvarlena

bytea_ops | hashvarlenaextended

char_ops | hashcharextended

char_ops | hashchar

cid_ops | hashint4extended

cid_ops | hashint4

date_ops | hashint4extended

date_ops | hashint4

enum_ops | hashenumextended

enum_ops | hashenum

float_ops | hashfloat4extended

float_ops | hashfloat8extended

float_ops | hashfloat4

float_ops | hashfloat8

…

可以用这些函数计算相关类型的哈希码

hank=# select hashtext('zhang');

hashtext

————-

-1172392837

(1 row)

hank=# select hashint4(10);

hashint4

————-

-1547814713

(1 row)

相关文章

【独家】新网站设计趋势一览（续）

【独家】如何进行有效的搜索引擎优化

【首发】企业网站的域名选择与保护

【首发】电商网站的物流配送解决方案

【首发】如何防范网站被黑客攻击

【独家】内容营销最佳实践分享（续）