Steve's Blog

Talk is cheap, show me the code.

0%

HashMap

1. 基础

常见的数据存储结构有:数组和链表,其中存储的数据都为有序。

  • 数组

优势:可以直接通过二分查找来找数据,时间复杂度$O(N)$,访问快

劣势:大小固定,需要添加元素且保持有序时,可能需要移动很多原来的元素,时间复杂度高为$O(N)$, 插入慢

  • 链表

优势:插入时时间复杂度为$O(N)$,但是不需要移动数据, 插入相对快

劣势:查询某个元素时平均时间复杂度为 $O(N )$,访问慢

  • 散列表HashMap

集合了数据和链表的优势

查询时间复杂度:$O(1)$

插入时间复杂度:$O(1)$

2. 常量,变量和构造方法

a. 常量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/**
* The default initial capacity - MUST be a power of two.
* 默认table大小,为16
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
* The maximum capacity, used if a higher value is implicitly specified
* by either of the constructors with arguments.
* MUST be a power of two <= 1<<30.
* table最大长度,为1,073,741,824
*/
static final int MAXIMUM_CAPACITY = 1 << 30;

/**
* The load factor used when none specified in constructor.
* 默认的负载因子
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f;

/**
* The bin count threshold for using a tree rather than list for a
* bin. Bins are converted to trees when adding an element to a
* bin with at least this many nodes. The value must be greater
* than 2 and should be at least 8 to mesh with assumptions in
* tree removal about conversion back to plain bins upon
* shrinkage.
* 树化阈值
*/
static final int TREEIFY_THRESHOLD = 8;

/**
* The bin count threshold for untreeifying a (split) bin during a
* resize operation. Should be less than TREEIFY_THRESHOLD, and at
* most 6 to mesh with shrinkage detection under removal.
* 树降级为链表的阈值
*/
static final int UNTREEIFY_THRESHOLD = 6;

/**
* The smallest table capacity for which bins may be treeified.
* (Otherwise the table is resized if too many nodes in a bin.)
* Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
* between resizing and treeification thresholds.
* table达到这个长度时才可以树化
*/
static final int MIN_TREEIFY_CAPACITY = 64;

b. 变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/**
* The table, initialized on first use, and resized as
* necessary. When allocated, length is always a power of two.
* (We also tolerate length zero in some operations to allow
* bootstrapping mechanics that are currently not needed.)
* 哈希表,什么时候初始化?第一次put的时候
*/
transient Node<K,V>[] table;

/**
* Holds cached entrySet(). Note that AbstractMap fields are used
* for keySet() and values().
*/
transient Set<Map.Entry<K,V>> entrySet;

/**
* The number of key-value mappings contained in this map.
* 哈希表中元素的个数
*/
transient int size;

/**
* The number of times this HashMap has been structurally modified
* Structural modifications are those that change the number of mappings in
* the HashMap or otherwise modify its internal structure (e.g.,
* rehash). This field is used to make iterators on Collection-views of
* the HashMap fail-fast. (See ConcurrentModificationException).
* 哈希表结构变化次数:put已有的key不会造成modCount变化
*/
transient int modCount;

/**
* The next size value at which to resize (capacity * load factor).
* 触发resize()扩容的阈值,当元素个数大于这个值时,触发扩容
* @serial
*/
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
int threshold;

/**
* The load factor for the hash table.
* 负载因子:threshold = capacity * loadFactor,其中capacity是哈希表数组的长度
* @serial
*/
final float loadFactor;

c. 构造方法

一共4个构造方法

  • new HashMap();

  • new HashMap(int initialCapacity);

  • new HashMap(int initialCapacity, float loadFactor);

  • new HashMap(Map<? extends K, ? extends V> m);

第一个只设置了默认了loadFactor;

第二个和第三个底层都是调用了第三个方法,对loadFactor和theashold进行了赋值;

第三个对传入的map进行处理,如果不为空,则遍历传入的map,每个元素调用put()方法放入hashMap表中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
/**
* Constructs an empty {@code HashMap} with the specified initial
* capacity and load factor.
*
* @param initialCapacity the initial capacity
* @param loadFactor the load factor
* @throws IllegalArgumentException if the initial capacity is negative
* or the load factor is nonpositive
* initialCapacity:初始哈希表大小,可以为0
* loadFactor:负载因子,必须大于0
*/
public HashMap(int initialCapacity, float loadFactor) {
// 判断传入的initialCapacity是否异常,其取值必须大于且小于MAXIMUM_CAPACITY(1 << 30)
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
// load factor必须大于0
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
// 此处有个针对threshold的处理方法,将其转为最小的大于initialCapacity且为2的幂次方的数
this.threshold = tableSizeFor(initialCapacity);
}

/**
* Constructs an empty {@code HashMap} with the specified initial
* capacity and the default load factor (0.75).
*
* @param initialCapacity the initial capacity.
* @throws IllegalArgumentException if the initial capacity is negative.
* 调用了双参数的构造方法
*/
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
* Constructs an empty {@code HashMap} with the default initial capacity
* (16) and the default load factor (0.75).
* 最常用的方法,只对load factor进行了设置,threshold在第一次put时设置,具体是在resize()方法中
*/
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/**
* Constructs a new {@code HashMap} with the same mappings as the
* specified {@code Map}. The {@code HashMap} is created with
* default load factor (0.75) and an initial capacity sufficient to
* hold the mappings in the specified {@code Map}.
*
* @param m the map whose mappings are to be placed in this map
* @throws NullPointerException if the specified map is null
*/
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}

d. tableSizeFor(int)方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* Returns a power of two size for the given target capacity.
* 返回大于等于传入的数且为2的幂的数
*/
static final int tableSizeFor(int cap) {
// 减1后,n的非符号不为0的最高位一定小于或等于cap的
int n = cap - 1;
// 无符号右移1位
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

3. get()方法

a. 基本逻辑

get()方法调用了getNode()方法,基本逻辑是:

  1. 通过hash()方法计算出传入key对应的slot下标;
  2. 如果slot为空,则返回null,结束;
  3. 判断传入的key是否和slot中的第一个元素的key相等,相等则返回Node的value,结束;
  4. 如果和slot第一个元素key不相等且slot下存储的是红黑树,通过红黑树对应方法获取Node值,结束;
  5. 如果slot下存储的是链表,遍历链表,判断是否有Node的key和传入的key相等,相等则返回对应Node的值,否则返回null,结束。

b. get()方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}. (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}

c. hash()方法

1
2
3
4
5
6
7
8
9
   /**
* 散列函数
*/
static final int hash(Object key) {
// h表示计算出来的hash值,和key的hashCode()不完全相同
int h;

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

这段代码是为了对key的hashCode进行扰动计算,防止不同hashCode的高位不同但低位相同导致的hash冲突。简单点说,就是为了把高位的特征和低位的特征组合起来,降低哈希冲突的概率,也就是说,尽量做到任何一位的变化都能对最终得到的结果产生影响。

扰动函数的作用还是减少哈希冲突,让元素存储的更加均匀。

d. getNode()方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/**
* Implements Map.get and related methods
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
// tab : 哈希表
// first : 计算出的数组中slot下的第一个元素
// e : 临时node
// n : 哈希表的数组长度
// k : 临时用来比较的node的key
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
// 哈希表不为空,数组长度不为0且对应的slot里有数据时进入
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
// slot中的第一个元素满足条件,直接返回
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
// 不是第一个元素,需要遍历
if ((e = first.next) != null) {
// 如果是红黑树,从红黑树中拿
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
// 通过遍历循环获取其值
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}

4. put()方法

a. 基本逻辑

put()方法调用了putVal()方法,基本逻辑是:

  1. 判断哈希表是否初始化,如果未初始化,调用resize()方法初始化;
  2. 通过散列函数计算出来的hash值和哈希表长度-1进行与运算,得出要插入的slot下标;
  3. 判断哈希表中对应slot的情况,有三种:无元素,有元素且为链表,有元素且为红黑树;
  4. 无元素则新建一个新的Node节点,到第7步;
  5. 元素为红黑树,则往红黑树中插入或更新一个节点,到第7步;
  6. 元素为链表,遍历链表,如果存在Node的key和插入key相同,只更新Node的value,否则在Node尾部新插入一个Node节点。之后进行一次判断,当前链表长度是否大于树化阈值TREEIFY_THRESHOLD,这个值为8,也就是插入的元素为第9个元素时,调用树化方法treeifyBin();
  7. 判断是否有新增元素,有则modCount加1,表示hashMap的结构发生了修改。size加1;
  8. 判断当前size是否大于扩容阈值theashold,如果大于,调用resize()方法
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
* <tt>null</tt> if there was no mapping for <tt>key</tt>.
* (A <tt>null</tt> return can also indicate that the map
* previously associated <tt>null</tt> with <tt>key</tt>.)
*
* 将键值对插入hashMap
*/
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}

b. putVal()方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
/**
* Implements Map.put and related methods
*
* @param hash hash for key : key的哈希值
* @param key the key
* @param value the value to put : 要设置的值
* @param onlyIfAbsent if true, don't change existing value : 如果为true,只有key未保存在HashMap中时才进行插入,否则不处理
* @param evict if false, the table is in creation mode.
* @return previous value, or null if none : 返回之前key对应之前的value,如果不存在返回null
*/
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
// tab : 哈希表的引用
// p : 临时变量,表示slot下的entry节点
// n : 哈希表中的长度
// i : key对应的slot下标
Node<K,V>[] tab; Node<K,V> p; int n, i;
// table为空或者哈希表大小为0,此时对hashMap进行赋值
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
// (n - 1) & hash 计算得出数组下标,如果数组元素为空,则为其赋值一个新节点
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
// e : 临时Node节点
// k : 要插入的key*
Node<K,V> e; K k;
// 如果key就是数组里的第一个元素,把p赋值给e
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
// 如果p是一个树节点,证明数组下标内存储的数据结构是红黑树,插入红黑树
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
// 如果节点的next节点为空,表示一直未找到和插入key相同的key,此时直接在next节点插入新节点
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
// binCount从0开始,TREEIFY_THRESHOLD为8
// binCount为0时,实际插入的是第2个元素p.next,所以,binCount为7时,插入了第9个元素,此时需要树化
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
// 插入完跳出循环
break;
}
// 找到和要插入的key 相同的key,此时e持有这个节点的引用,直接跳出循环
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
// 上面两个if均不满足,表名p.next!=null 且没找到和要插入的key相同的Node,将e赋给p,继续下一次循环
p = e;
}
}
// 如果e不为空,表名前面的代码找到了旧的Node,进行value的赋值操作,此处操作完直接返回,没有对modCount进行操作
if (e != null) { // existing mapping for key
V oldValue = e.value;
// 如果onlyIfAbsent为false或者旧的value为空时,将新的value赋值进去
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
// 到此处表示对哈希表的结构进行了改变,modCount+1
++modCount;
// 如果添加了元素后的哈希表size大于扩容阈值,进行扩容
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}

5. resize()方法

resize()方法常见情况有以下几个地方调用:

  1. 第一次调用put()方法时,table为空,调用resize()方法初始化哈希表;
  2. put()方法插入新元素后,判断当前size是否大于扩容阈值theashold,大于则进行扩容操作;
  3. put()方法slot下为链表插入新元素后,如果需要链表长度大于树化阈值,则进行树化操作,此时,树化方法会判断哈希表数组长度是否大于64,小于时不会进行树化,而是调用resize()方法进行扩容;这边的64个人推测应该是一个折衷的数字,数组长度太小,则resize()方法时间消耗不大,扩容后元素分布会更加稀疏,相对树化是更优解。
  4. 调用putAll()方法时,底层调用putMapEntries()方法,如果传如的map的size大于扩容阈值theashold,调用resize()方法。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
/**
* Initializes or doubles table size. If null, allocates in
* accord with initial capacity target held in field threshold.
* Otherwise, because we are using power-of-two expansion, the
* elements from each bin must either stay at same index, or move
* with a power of two offset in the new table.
*
* @return the table
*/
final Node<K,V>[] resize() {
// oldTab : 扩容前的数组
Node<K,V>[] oldTab = table;
// oldCap : 扩容前的数组长度
int oldCap = (oldTab == null) ? 0 : oldTab.length;
// oldThr : 扩容前的扩容阈值
int oldThr = threshold;
// region 计算 newCap和newThr
// newCap : 扩容后的数组长度,需要计算
// newThr : 扩容后的扩容阈值,需要计算
int newCap, newThr = 0;
// 旧数组长度大于0,表示已经初始化过了
if (oldCap > 0) {
// 如果旧的数组长度已经大于最大数组长度MAXIMUM_CAPACITY,此时不再扩容,将扩容阈值设置为int最大值,直接返回
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
// newCap = oldCap << 1表示新的数组大小为旧数组的2倍
// 如果新数组的长度小于最大数组长度,且旧的数组长度大于等于默认初始长度16,扩容阈值也变为旧的2倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
// 之前未初始化(put操作)过,且旧的扩容阈值大于0,有两种情况
// new HashMap(int capacity, float loadFactor)
// new HashMap(int capacity)
// 此时threashold为第一次赋值的2的幂次方,所以直接赋值成为下一次的数组长度
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
// 之前未初始化(put操作)过,且旧的扩容阈值为0,有一种情况
// new HashMap()
// 此时新的数组长度为16,新的阈值长度为默认长度乘以默认负载因子 16 * 0.75 = 12
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
// 未设置新的扩容阈值时,根据新的数组长度和负载因子算出来
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
// endregion
// 算出来的扩容阈值赋值给threshold
threshold = newThr;
// 创建新的数组
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
// 只有第一次创建时,旧的数组为空,此时直接返回
if (oldTab != null) {
// 遍历旧数组的,重新插入数据
for (int j = 0; j < oldCap; ++j) {
// e : 临时元素,持有当前操作的Node
Node<K,V> e;
// 如果当前下标数组元素不为空
if ((e = oldTab[j]) != null) {
// 旧数据被e持有,后续可直接对旧数组进行GC
oldTab[j] = null;
// 此下标只有一个元素,直接计算出元素的下标,赋值到新数组
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
// 如果是树节点,往红黑树插入
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
// 数组扩容后,长度多了一位,旧的hash值计算出来有两种情况
// 高位是0,此时和旧数组中的下标一致oldHash
// 高位是1,此时下标为 oldHash + oldCap
// 低位链表 loHead链表头,loTail链表尾部
Node<K,V> loHead = null, loTail = null;
// 高位链表
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
// (e.hash & oldCap) == 0表明新hash值是低位
if ((e.hash & oldCap) == 0) {
// 第一次插入loTail为空,把loHead赋值为当前Node
if (loTail == null)
loHead = e;
// 不是第一次插入,在loTail后面新插入即可
else
loTail.next = e;
// loTail一直指向最新插入的值
loTail = e;
}
// 高位插入,具体逻辑和低位类似
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
// 去掉新插入后低位链表尾部的引用
loTail.next = null;
// 把低位链表头赋值到数组里,下标是当前下标
newTab[j] = loHead;
}
if (hiTail != null) {
// 去掉新插入后低位链表尾部的引用
hiTail.next = null;
// 把低位链表头赋值到数组里,下标是当前下标 + oldCap
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}

参考文档

[1] JDK 源码中 HashMap 的 hash 方法原理是什么?

[2] HashMap及其原理