1. 基础

常见的数据存储结构有：数组和链表，其中存储的数据都为有序。

数组

优势：可以直接通过二分查找来找数据，时间复杂度$O(N)$，访问快

劣势：大小固定，需要添加元素且保持有序时，可能需要移动很多原来的元素，时间复杂度高为$O(N)$， 插入慢

链表

优势：插入时时间复杂度为$O(N)$，但是不需要移动数据， 插入相对快

劣势：查询某个元素时平均时间复杂度为 $O(N )$，访问慢

散列表HashMap

集合了数据和链表的优势

查询时间复杂度：$O(1)$

插入时间复杂度：$O(1)$

2. 常量，变量和构造方法

a. 常量

/**
   * The default initial capacity - MUST be a power of two.
   * 默认table大小，为16
   */
  static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

  /**
   * The maximum capacity, used if a higher value is implicitly specified
   * by either of the constructors with arguments.
   * MUST be a power of two <= 1<<30.
   * table最大长度，为1,073,741,824
   */
  static final int MAXIMUM_CAPACITY = 1 << 30;

  /**
   * The load factor used when none specified in constructor.
   * 默认的负载因子
   */
  static final float DEFAULT_LOAD_FACTOR = 0.75f;

  /**
   * The bin count threshold for using a tree rather than list for a
   * bin.  Bins are converted to trees when adding an element to a
   * bin with at least this many nodes. The value must be greater
   * than 2 and should be at least 8 to mesh with assumptions in
   * tree removal about conversion back to plain bins upon
   * shrinkage.
   * 树化阈值
   */
  static final int TREEIFY_THRESHOLD = 8;

  /**
   * The bin count threshold for untreeifying a (split) bin during a
   * resize operation. Should be less than TREEIFY_THRESHOLD, and at
   * most 6 to mesh with shrinkage detection under removal.
   * 树降级为链表的阈值
   */
  static final int UNTREEIFY_THRESHOLD = 6;

  /**
   * The smallest table capacity for which bins may be treeified.
   * (Otherwise the table is resized if too many nodes in a bin.)
   * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
   * between resizing and treeification thresholds.
   * table达到这个长度时才可以树化
   */
  static final int MIN_TREEIFY_CAPACITY = 64;

b. 变量

/**
    * The table, initialized on first use, and resized as
    * necessary. When allocated, length is always a power of two.
    * (We also tolerate length zero in some operations to allow
    * bootstrapping mechanics that are currently not needed.)
    * 哈希表，什么时候初始化？第一次put的时候
    */
   transient Node<K,V>[] table;

   /**
    * Holds cached entrySet(). Note that AbstractMap fields are used
    * for keySet() and values().
    */
   transient Set<Map.Entry<K,V>> entrySet;

   /**
    * The number of key-value mappings contained in this map.
    * 哈希表中元素的个数
    */
   transient int size;

   /**
    * The number of times this HashMap has been structurally modified
    * Structural modifications are those that change the number of mappings in
    * the HashMap or otherwise modify its internal structure (e.g.,
    * rehash).  This field is used to make iterators on Collection-views of
    * the HashMap fail-fast.  (See ConcurrentModificationException).
    * 哈希表结构变化次数：put已有的key不会造成modCount变化
    */
   transient int modCount;

   /**
    * The next size value at which to resize (capacity * load factor).
    * 触发resize()扩容的阈值，当元素个数大于这个值时，触发扩容
    * @serial
    */
   // (The javadoc description is true upon serialization.
   // Additionally, if the table array has not been allocated, this
   // field holds the initial array capacity, or zero signifying
   // DEFAULT_INITIAL_CAPACITY.)
   int threshold;

   /**
    * The load factor for the hash table.
    * 负载因子：threshold = capacity * loadFactor，其中capacity是哈希表数组的长度 
    * @serial
    */
   final float loadFactor;

c. 构造方法

一共4个构造方法

new HashMap();
new HashMap(int initialCapacity);
new HashMap(int initialCapacity, float loadFactor);
new HashMap(Map<? extends K, ? extends V> m);

第一个只设置了默认了loadFactor；

第二个和第三个底层都是调用了第三个方法，对loadFactor和theashold进行了赋值；

第三个对传入的map进行处理，如果不为空，则遍历传入的map，每个元素调用put()方法放入hashMap表中。

/**
     * Constructs an empty {@code HashMap} with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     * initialCapacity：初始哈希表大小，可以为0
     * loadFactor：负载因子，必须大于0      
     */
    public HashMap(int initialCapacity, float loadFactor) {
    	// 判断传入的initialCapacity是否异常，其取值必须大于且小于MAXIMUM_CAPACITY（1 << 30）
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
            // load factor必须大于0
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        // 此处有个针对threshold的处理方法，将其转为最小的大于initialCapacity且为2的幂次方的数
        this.threshold = tableSizeFor(initialCapacity);
    }

    /**
     * Constructs an empty {@code HashMap} with the specified initial
     * capacity and the default load factor (0.75).
     *
     * @param  initialCapacity the initial capacity.
     * @throws IllegalArgumentException if the initial capacity is negative.
     * 调用了双参数的构造方法
     */
    public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

    /**
     * Constructs an empty {@code HashMap} with the default initial capacity
     * (16) and the default load factor (0.75).
     * 最常用的方法，只对load factor进行了设置，threshold在第一次put时设置，具体是在resize()方法中
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
    }

    /**
     * Constructs a new {@code HashMap} with the same mappings as the
     * specified {@code Map}.  The {@code HashMap} is created with
     * default load factor (0.75) and an initial capacity sufficient to
     * hold the mappings in the specified {@code Map}.
     *
     * @param   m the map whose mappings are to be placed in this map
     * @throws  NullPointerException if the specified map is null
     */
    public HashMap(Map<? extends K, ? extends V> m) {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        putMapEntries(m, false);
    }

d. `tableSizeFor(int)`方法

/**
     * Returns a power of two size for the given target capacity.
     * 返回大于等于传入的数且为2的幂的数
     */
    static final int tableSizeFor(int cap) {
        // 减1后，n的非符号不为0的最高位一定小于或等于cap的
        int n = cap - 1;
        // 无符号右移1位
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

3. get()方法

a. 基本逻辑

get()方法调用了getNode()方法，基本逻辑是：

通过hash()方法计算出传入key对应的slot下标；
如果slot为空，则返回null，结束；
判断传入的key是否和slot中的第一个元素的key相等，相等则返回Node的value，结束；
如果和slot第一个元素key不相等且slot下存储的是红黑树，通过红黑树对应方法获取Node值，结束；
如果slot下存储的是链表，遍历链表，判断是否有Node的key和传入的key相等，相等则返回对应Node的值，否则返回null，结束。

b. `get()`方法

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

c. `hash()`方法

   /**
    * 散列函数
    */
static final int hash(Object key) {
       // h表示计算出来的hash值，和key的hashCode()不完全相同
       int h;
       
       return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
   }

这段代码是为了对key的hashCode进行扰动计算，防止不同hashCode的高位不同但低位相同导致的hash冲突。简单点说，就是为了把高位的特征和低位的特征组合起来，降低哈希冲突的概率，也就是说，尽量做到任何一位的变化都能对最终得到的结果产生影响。

扰动函数的作用还是减少哈希冲突，让元素存储的更加均匀。

d. `getNode()`方法

/**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        // tab : 哈希表
        // first : 计算出的数组中slot下的第一个元素
        // e : 临时node
        // n : 哈希表的数组长度
        // k : 临时用来比较的node的key
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        // 哈希表不为空，数组长度不为0且对应的slot里有数据时进入
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            // slot中的第一个元素满足条件，直接返回
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            // 不是第一个元素，需要遍历
            if ((e = first.next) != null) {
                // 如果是红黑树，从红黑树中拿
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 通过遍历循环获取其值
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

4. put()方法

a. 基本逻辑

put()方法调用了putVal()方法，基本逻辑是：

判断哈希表是否初始化，如果未初始化，调用resize()方法初始化；
通过散列函数计算出来的hash值和哈希表长度-1进行与运算，得出要插入的slot下标；
判断哈希表中对应slot的情况，有三种：无元素，有元素且为链表，有元素且为红黑树；
无元素则新建一个新的Node节点，到第7步；
元素为红黑树，则往红黑树中插入或更新一个节点，到第7步；
元素为链表，遍历链表，如果存在Node的key和插入key相同，只更新Node的value，否则在Node尾部新插入一个Node节点。之后进行一次判断，当前链表长度是否大于树化阈值TREEIFY_THRESHOLD，这个值为8，也就是插入的元素为第9个元素时，调用树化方法treeifyBin();
判断是否有新增元素，有则modCount加1，表示hashMap的结构发生了修改。size加1；
判断当前size是否大于扩容阈值theashold，如果大于，调用resize()方法

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 *
 * 将键值对插入hashMap
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

b. `putVal()`方法

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key : key的哈希值
 * @param key the key 
 * @param value the value to put : 要设置的值
 * @param onlyIfAbsent if true, don't change existing value : 如果为true，只有key未保存在HashMap中时才进行插入，否则不处理
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none : 返回之前key对应之前的value，如果不存在返回null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    // tab : 哈希表的引用
    // p : 临时变量，表示slot下的entry节点
    // n : 哈希表中的长度
    // i : key对应的slot下标
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // table为空或者哈希表大小为0，此时对hashMap进行赋值
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // (n - 1) & hash 计算得出数组下标，如果数组元素为空，则为其赋值一个新节点
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        // e : 临时Node节点
        // k : 要插入的key*
        Node<K,V> e; K k;
        // 如果key就是数组里的第一个元素，把p赋值给e
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 如果p是一个树节点，证明数组下标内存储的数据结构是红黑树，插入红黑树
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                // 如果节点的next节点为空，表示一直未找到和插入key相同的key，此时直接在next节点插入新节点
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    // binCount从0开始，TREEIFY_THRESHOLD为8
                    // binCount为0时，实际插入的是第2个元素p.next，所以，binCount为7时，插入了第9个元素，此时需要树化
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    // 插入完跳出循环
                    break;
                }
                // 找到和要插入的key 相同的key，此时e持有这个节点的引用，直接跳出循环
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                // 上面两个if均不满足，表名p.next!=null 且没找到和要插入的key相同的Node，将e赋给p，继续下一次循环
                p = e;
            }
        }
        // 如果e不为空，表名前面的代码找到了旧的Node，进行value的赋值操作，此处操作完直接返回，没有对modCount进行操作
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // 如果onlyIfAbsent为false或者旧的value为空时，将新的value赋值进去
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    // 到此处表示对哈希表的结构进行了改变，modCount+1
    ++modCount;
    // 如果添加了元素后的哈希表size大于扩容阈值，进行扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

5. resize()方法

resize()方法常见情况有以下几个地方调用：

第一次调用put()方法时，table为空，调用resize()方法初始化哈希表；
put()方法插入新元素后，判断当前size是否大于扩容阈值theashold，大于则进行扩容操作；
put()方法slot下为链表插入新元素后，如果需要链表长度大于树化阈值，则进行树化操作，此时，树化方法会判断哈希表数组长度是否大于64，小于时不会进行树化，而是调用resize()方法进行扩容；这边的64个人推测应该是一个折衷的数字，数组长度太小，则resize()方法时间消耗不大，扩容后元素分布会更加稀疏，相对树化是更优解。
调用putAll()方法时，底层调用putMapEntries()方法，如果传如的map的size大于扩容阈值theashold，调用resize()方法。

/**
    * Initializes or doubles table size.  If null, allocates in
    * accord with initial capacity target held in field threshold.
    * Otherwise, because we are using power-of-two expansion, the
    * elements from each bin must either stay at same index, or move
    * with a power of two offset in the new table.
    *
    * @return the table
    */
   final Node<K,V>[] resize() {
       // oldTab : 扩容前的数组
       Node<K,V>[] oldTab = table;
       // oldCap : 扩容前的数组长度
       int oldCap = (oldTab == null) ? 0 : oldTab.length;
       // oldThr : 扩容前的扩容阈值
       int oldThr = threshold;
       // region 计算 newCap和newThr
       // newCap : 扩容后的数组长度，需要计算
       // newThr : 扩容后的扩容阈值，需要计算
       int newCap, newThr = 0;
       // 旧数组长度大于0，表示已经初始化过了
       if (oldCap > 0) {
           // 如果旧的数组长度已经大于最大数组长度MAXIMUM_CAPACITY，此时不再扩容，将扩容阈值设置为int最大值，直接返回
           if (oldCap >= MAXIMUM_CAPACITY) {
               threshold = Integer.MAX_VALUE;
               return oldTab;
           }
           // newCap = oldCap << 1表示新的数组大小为旧数组的2倍
           // 如果新数组的长度小于最大数组长度，且旧的数组长度大于等于默认初始长度16，扩容阈值也变为旧的2倍
           else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
               newThr = oldThr << 1; // double threshold
       }
       // 之前未初始化（put操作）过，且旧的扩容阈值大于0，有两种情况
       // new HashMap(int capacity, float loadFactor)
       // new HashMap(int capacity)
       // 此时threashold为第一次赋值的2的幂次方，所以直接赋值成为下一次的数组长度
       else if (oldThr > 0) // initial capacity was placed in threshold
           newCap = oldThr;
       // 之前未初始化（put操作）过，且旧的扩容阈值为0，有一种情况
       // new HashMap()
       // 此时新的数组长度为16，新的阈值长度为默认长度乘以默认负载因子 16 * 0.75 = 12
       else {               // zero initial threshold signifies using defaults
           newCap = DEFAULT_INITIAL_CAPACITY;
           newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
       }
       // 未设置新的扩容阈值时，根据新的数组长度和负载因子算出来
       if (newThr == 0) {
           float ft = (float)newCap * loadFactor;
           newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                     (int)ft : Integer.MAX_VALUE);
       }
       // endregion
       // 算出来的扩容阈值赋值给threshold
       threshold = newThr;
       // 创建新的数组
       @SuppressWarnings({"rawtypes","unchecked"})
           Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
       table = newTab;
       // 只有第一次创建时，旧的数组为空，此时直接返回
       if (oldTab != null) {
           // 遍历旧数组的，重新插入数据
           for (int j = 0; j < oldCap; ++j) {
               // e : 临时元素，持有当前操作的Node
               Node<K,V> e;
               // 如果当前下标数组元素不为空
               if ((e = oldTab[j]) != null) {
                   // 旧数据被e持有，后续可直接对旧数组进行GC
                   oldTab[j] = null;
                   // 此下标只有一个元素，直接计算出元素的下标，赋值到新数组
                   if (e.next == null)
                       newTab[e.hash & (newCap - 1)] = e;
                   // 如果是树节点，往红黑树插入
                   else if (e instanceof TreeNode)
                       ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                   else { // preserve order
                       // 数组扩容后，长度多了一位，旧的hash值计算出来有两种情况
                       // 高位是0，此时和旧数组中的下标一致oldHash
                       // 高位是1，此时下标为 oldHash + oldCap
                       // 低位链表 loHead链表头，loTail链表尾部
                       Node<K,V> loHead = null, loTail = null;
                       // 高位链表
                       Node<K,V> hiHead = null, hiTail = null;
                       Node<K,V> next;
                       do {
                           next = e.next;
                           // (e.hash & oldCap) == 0表明新hash值是低位
                           if ((e.hash & oldCap) == 0) {
                               // 第一次插入loTail为空，把loHead赋值为当前Node
                               if (loTail == null)
                                   loHead = e;
                               // 不是第一次插入，在loTail后面新插入即可
                               else
                                   loTail.next = e;
                               // loTail一直指向最新插入的值
                               loTail = e;
                           }
                           // 高位插入，具体逻辑和低位类似
                           else {
                               if (hiTail == null)
                                   hiHead = e;
                               else
                                   hiTail.next = e;
                               hiTail = e;
                           }
                       } while ((e = next) != null);
                       if (loTail != null) {
                           // 去掉新插入后低位链表尾部的引用
                           loTail.next = null;
                           // 把低位链表头赋值到数组里，下标是当前下标
                           newTab[j] = loHead;
                       }
                       if (hiTail != null) {
                           // 去掉新插入后低位链表尾部的引用
                           hiTail.next = null;
                           // 把低位链表头赋值到数组里，下标是当前下标 + oldCap
                           newTab[j + oldCap] = hiHead;
                       }
                   }
               }
           }
       }
       return newTab;
   }

参考文档

[1] JDK 源码中 HashMap 的 hash 方法原理是什么？

[2] HashMap及其原理

1. 基础

2. 常量，变量和构造方法

a. 常量

b. 变量

c. 构造方法

d. tableSizeFor(int)方法

3. get()方法

a. 基本逻辑

b. get()方法

c. hash()方法

d. getNode()方法

4. put()方法

a. 基本逻辑

b. putVal()方法

5. resize()方法

参考文档

d. `tableSizeFor(int)`方法

b. `get()`方法

c. `hash()`方法

d. `getNode()`方法

b. `putVal()`方法