a faster immutable stack/list
The canonical encoding of list in scala 2 would be:
sealed trait List[+A]
object List {
case object Empty extends List[Nothing]
case class NonEmpty[A](head: A, tail: List[A]) extends List[A]
}
but this implementation involves a lot of pointer chasing (to iterate N times into the List we need to dereference N times).
We know from work on immutable vectors that having blocks of arrays that we copy on change can give considerable improvements due to cache locality (and fewer derefs).
So one can imagine another list:
sealed trait List[+A] {
def uncons: Option[(A, List[A])]
def prepend(a: A): List[A]
}
object List {
private val BlockSize = 16 // experiment on this
private case object Empty extends List[Nothing] {
def uncons = None
def prepend(a: A) = {
val ary = new Array[Any](BlockSize)
val offset = BlockSize - 1
ary(offset) = a
Impl(offset, ary, Empty)
}
}
private case class Impl[A](offset: Int, block: Array[Any], tail: List[A]) extends List[A] {
def uncons = {
val nextOffset = offset + 1
val next = if (nextOffset == block.length) tail else Impl(nextOffset, block, tail)
Some((block(offset).asInstanceOf[A], next)
}
def prepend(a: A) = {
val ary = new Array[Any](BlockSize)
if (offset > 0) {
// copy the right side
System.arraycopy(block, offset, ary, offset, BlockSize - offset)
val nextOffset = offset - 1
ary(nextOffset) = a
Impl(nextOffset, ary, tail)
}
else {
val offset = BlockSize - 1
ary(offset) = a
Impl(offset, ary, this)
}
}
}
or something like this...
The idea is to copy the block on write. Of course, apply, map, foldLeft, foreach, iterator, etc... could be significantly optimized since once you have a handle on the block iterating to the next value will be in cache.
With some appropriate benchmarks you could set the size of the block to some optimal value that I bet would be bigger than 1. It could be that this beats naive List in almost all cases, and when the List isn't too large it could compete with Vector (which isn't as fast as List for stack uses).
I like this!
A couple other thoughts:
-
creating a
Listby concatenating to a mutable builder would also be a lot faster since you can just place elements in theArray. I guess you'd have to allow the last block to be incomplete (needs logic but seems fine ...) -
I wonder if it's worth to try to avoid copying the block on the first prepend, optimistically assuming that it will not be prepended more than once. scodec
ByteVectorhas some optimizations like this for appending:
https://github.com/scodec/scodec-bits/blob/8a23a541def13150768c83aa0b077afbda4b3f34/core/shared/src/main/scala/scodec/bits/ByteVector.scala#L2311-L2317
Perhaps this will serve as inspiration: https://github.com/sageserpent-open/NTestCaseBuilder/blob/master/development/solution/SageSerpent.Infrastructure/ChunkedList.fs ?
It has a rather horrific test too: https://github.com/sageserpent-open/NTestCaseBuilder/blob/master/development/solution/SageSerpent.Infrastructure.Tests/ChunkedListTests.fs
Hi @armanbilge, @johnynek I've implemented the chunked list locally with about 78% tests passing.
The implementation uses prepend-based stack semantics as sketched, but I'm getting Cats law violations in Traverse and Monoid tests. For example, foldLeft processes elements in reverse order compared to standard List expectations.
Should we embrace these stack semantics (possibly renaming to ChunkedStack) or rework for law-abiding list behavior?
Would appreciate your guidance on the intended direction!
Thanks 😊
Hi @najuna-brian, thanks for your interest in this!
Would appreciate your guidance on the intended direction!
The best way to get feedback/guidance from us is to open a (draft) PR. Then, it is easier for us to understand your work and the questions/problems you have encountered.
I've implemented the chunked list locally with about 78% tests passing.
Outreachy is currently in the application phase. During this phase we recommend working on good first issues, so that you can get familiar with Typelevel and also so that we can get to know you ☺️ if you complete the entire project during the application period, then that would make you ineligible to work on the project for your internship!
Thank you for the guidance @armanbilge I will please do as guided ☺️