Collections(二)List上篇-ArrayList

Open Sayi opened this issue 7 years ago • 0 comments

List是一个有序的序列结构，基于 位置访问 ，List接口在Collection接口上新增加了一些方法，允许在指定位置插入和删除元素，也允许对元素的搜索(indexOf)。List是如何实现Iterator迭代器，ArrayList 和 LinkedList的性能差异为何会表现在不同方面，本文将对ArrayList进行详细分析。

List接口

我们先来看看List接口，提供了基于位置访问的一些方法。

方法	作用
E get(int index)	从某个位置获取元素
E set(int index, E element)	某个位置设置元素
void add(int index, E element)	某个位置增加元素
E remove(int index)	移除某个位置的元素
int indexOf(Object o)	搜索某个元素首次出现的位置
int lastIndexOf(Object o)	搜索某个元素最后出现的位置

ArrayList实现原理

ArrayList是基于可变数组实现的，内部维护了一个Object[] elementData数组，同时提供了操纵数组容量的方法，List的大小是由size变量决定的，容量是由数组的大小决定的，容量Capacity不满足size大小时，ArrayList就会自动通过合适的扩容策略进行扩容。

初始容量

ArrayList遵循上一节转换构造器的原则：提供了一个无参构造方法和Collection为参数的构造方法。

同时，还提供了一个int类型的参数来定义内部数组的容量public ArrayList(int initialCapacity)，这个参数的初始容量默认是定义的一个常量：

/**
 * Default initial capacity.
 */
private static final int DEFAULT_CAPACITY = 10;

两个问题：1.ArrayList什么时候会去增长容量？2.它的容量增长策略是怎么实现的呢？

需要扩容的场景肯定是因为容量无法满足元素的个数了，所以在增加元素add的时候，或者增加集合元素addAll，或者以另一个集合初始化ArrayList的时候，都会去判断是否增加容量，本章以及接下来的所有章节增长容量策略的分析都将依据 增加元素 源码进行分析。

增加元素和增长容量策略分析

直接看源码，数组的复制使用了高效的System.arraycopy方法，参数可以是不同数组，也可以作用于同一个数组。

public boolean add(E e) {
    ensureCapacityInternal(size + 1);  // Increments modCount!!

    elementData[size++] = e;
    return true;
}

public void add(int index, E element) {
    rangeCheckForAdd(index); // 检查下标是否越界
    ensureCapacityInternal(size + 1);  // Increments modCount!!

    System.arraycopy(elementData, index, elementData, index + 1,
                     size - index);
    elementData[index] = element;
    size++;
}

方法ensureCapacityInternal(size + 1);是为了容量增长策略设计的，我们看下这个方法：

private void ensureCapacityInternal(int minCapacity) {
  if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
    minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
  }

  ensureExplicitCapacity(minCapacity);
}

private void ensureExplicitCapacity(int minCapacity) {
  modCount++; // 这是为fail-fast机制设计的变量

  // overflow-conscious code
  if (minCapacity - elementData.length > 0)
    grow(minCapacity);
}

private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

private void grow(int minCapacity) {
  // overflow-conscious code
  int oldCapacity = elementData.length;
  int newCapacity = oldCapacity + (oldCapacity >> 1);
  if (newCapacity - minCapacity < 0)
    newCapacity = minCapacity;
  if (newCapacity - MAX_ARRAY_SIZE > 0)
    newCapacity = hugeCapacity(minCapacity);
  // minCapacity is usually close to size, so this is a win:
  elementData = Arrays.copyOf(elementData, newCapacity);
}

private static int hugeCapacity(int minCapacity) {
  if (minCapacity < 0) // overflow
    throw new OutOfMemoryError();
  return (minCapacity > MAX_ARRAY_SIZE) ?
    Integer.MAX_VALUE :
    MAX_ARRAY_SIZE;
}

这部分源码其实很好理解，我们能得出以下几个结论，这些结论很好的回答了上文中提到的两个问题：

if (minCapacity - elementData.length > 0) grow(minCapacity); 当元素个数size超过了内部维护的数组容量大小时，就会去扩容
int newCapacity = oldCapacity + (oldCapacity >> 1); 增加的容量是当前容量的二分之一，至于为什么是1/2，这应该是JDK设计者的一个折中选择问题
采用Arrays.copyOf复制元素到新的空间，底层也是采用System.arraycopy方法

基础源码分析

获取元素

public E get(int index) {
    rangeCheck(index);

    return elementData(index);
}

rangeCheck检测角标index是否越界合法。

更新元素

public E set(int index, E element) {
    rangeCheck(index);

    E oldValue = elementData(index);
    elementData[index] = element;
    return oldValue;
}

通过set方法设置元素，会返回老的元素值。

删除元素

public E remove(int index) {
    rangeCheck(index);

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;
    if (numMoved > 0)
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}

和增加元素一样，删除元素的代码也采用了System.arraycopy方法进行数组移位可以得出结论，ArrayList在插入和删除操作会花费一定的代价，随机访问操作的时间复杂度将是常量级的。

索引元素

public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

iterator源码实现和fail-fast设计

private class Itr implements Iterator<E> {
  int cursor;     // index of next element to return
  int lastRet = -1; // index of last element returned; -1 if no such
  int expectedModCount = modCount;

  public boolean hasNext() {
    return cursor != size;
  }

  @SuppressWarnings("unchecked")
  public E next() {
    checkForComodification();
    int i = cursor;
    if (i >= size)
      throw new NoSuchElementException();
    Object[] elementData = ArrayList.this.elementData;
    if (i >= elementData.length)
      throw new ConcurrentModificationException();
    cursor = i + 1;
    return (E) elementData[lastRet = i];
  }

  public void remove() {
    if (lastRet < 0)
      throw new IllegalStateException();
    checkForComodification();

    try {
      ArrayList.this.remove(lastRet);
      cursor = lastRet;
      lastRet = -1;
      expectedModCount = modCount;
    } catch (IndexOutOfBoundsException ex) {
      throw new ConcurrentModificationException();
    }
  }

  @Override
  @SuppressWarnings("unchecked")
  public void forEachRemaining(Consumer<? super E> consumer) {
    Objects.requireNonNull(consumer);
    final int size = ArrayList.this.size;
    int i = cursor;
    if (i >= size) {
      return;
    }
    final Object[] elementData = ArrayList.this.elementData;
    if (i >= elementData.length) {
      throw new ConcurrentModificationException();
    }
    while (i != size && modCount == expectedModCount) {
      consumer.accept((E) elementData[i++]);
    }
    // update once at end of iteration to reduce heap write traffic
    cursor = i;
    lastRet = i - 1;
    checkForComodification();
  }

  final void checkForComodification() {
    if (modCount != expectedModCount)
      throw new ConcurrentModificationException();
  }
}

Itr是ArrayList的一个内部类，所以可以通过ArrayList.this方式引用ArrayList内部域。

hasNext()和next()方法的实现就是不断向后移动下标cursor，直到cursor==size为止。

上一篇文章说过，ConcurrentModificationException是由迭代器抛出来的异常，从源码中我们可以看到，当modCount != expectedModCount时，就会提前抛出异常，即fail-fast，其中expectedModCount是在构造Itr对象时产生的modCount是存在ArrayList的一个变量，在每次新增元素、删除元素或者结构改变(比如排序等)，就会累加modCount，上文ensureExplicitCapacity源码中就能看到modCount++操作。

我们已经知道，在迭代集合时，是不允许更改集合内部结构的，如果希望某个符合条件的元素删除呢？答案是使用java.util.Iterator.remove()方法。

ArrayList提供了方法ListIterator<E> listIterator()获取ListIterator迭代器，这是一个为list打造的迭代器，允许双向遍历和更多操作。

public interface ListIterator<E> extends Iterator<E> {
    // Query Operations
    boolean hasNext();
    E next();
    boolean hasPrevious();
    E previous();
    int nextIndex();
    int previousIndex();

    // Modification Operations
    void remove();
    void set(E e);
    void add(E e);
}

性能

性能问题分为时间复杂度和空间复杂度两方面去讨论。

ArrayList实现了RandomAccess接口，它长于随机访问，不擅于插入和删除，这与LinkedList是相反的。

如果初始容量过大实际元素过少，则会造成空间的浪费。如果一次插入的元素很多，而初始容量不够，则会频繁的扩容，这无疑增加的时间复杂度。

为了性能考虑，在插入大量数据的时候，我们可以选择一个合适的初始容量，如果在一个初始容量已定的ArrayList中插入大量数据，可以通过调用方法`public void ensureCapacity(int minCapacity) 确保一次性扩容足够大的容量。

如果ArrayList中元素已经添加完毕，我们可以通过调用方法 public void trimToSize() 将内部数组的容量调整为实际元素的size大小来节约空间。

下篇文章，我们将会对LinkedList进行分析。

Aug 30 '18 07:08 Sayi