Improve implementation of Vector.tabulate
Currently, Vector.tabluate is a straightforward translation of the
following function from mlvectorScript.sml:
val tabulate_def = Define`
tabulate n f = Vector (GENLIST f n)`;
For large n, this is a bad implementation since a list of size n
requires 3 * n words of space on the heap. Furthermore, the final
vector requires n words of space, and when the vector is created in
data_to_word both the list and the space for the new vector must be
present in the heap simultaneously. Thus the space required to create
a vector of length n is a total of 4 * n words of memory during
creation. (Once the vector has been created the intermediate list
becomes garbage which will eventually be removed by the GC.)
With the current implementation of the compiler, where vectors are
represented as Block in BVL and the only operation to produce such
in data_to_wordis FromList, this space inefficiency is
unavoidable.
Option 1
I propose that each vector is implemented in BVL as a ValueArray
instead of a Block. The idea is that the source would have a new
primitive, VectorFromArray with which Vector.tabluate would be
implemented, i.e. first producing an array with the desired content,
then apply the primitive to this array in order to turn it into
vector. By doing so, Vector.tabluate could perform allocation with a
cost of 2 * n words of memory, instead of 4 * n words of
memory. This wouldn't be much of a speed boost, but it would reduce
this momentary peak in space consumption by half.
One benefit of this scheme is that the slightly involved verification
of FromList in data_to_word_assignProof could be replaced by a
higher-level and easier-to-maintain verification proofs of FromList
and FromArray stub code in pat_to_closProof.
The downside of this scheme is that all vectors (of which there aren't many) are treated less efficiently by the generational garbage collector, because it treats them as potentially mutable.
An even more space efficient (but much more challenging) approach
would be to insert a VecTabluate primitive to source and compile it
only in pat_to_clos. This avoids the intermediate array and can thus
consume only n words of memory at creation. However, this requires
supporting a primitive that calls evaluate internally for the
function f and it requires changing pat_to_clos to have an
existential in the proof goal (changes the current nice shape of the
proof completely). I do not recommend this option.
Option 2
I propose that we keep representing vectors as Blocks in BVL, and
instead (1) add a new primitive FromArray to every intermediate
language down to and including dataLang, and (2) compile this in
data_to_word to a call to a memcopy stub.
Recommendation
I prefer option 2 because it has the same peak space consumption as option 1 but is better in that it allows the GC to treat vectors as immutable data.
This was motivated by problems with space consumption that @tanyongkiam found when benchmarking the compiler.
See comment by @myreen https://hol.zulipchat.com/#narrow/channel/518311-CakeML-issue-review/topic/Improve.20implementation.20of.20Vector.2Etabulate.20.23411/near/531273893
This should be open. I'm still of the opinion that option 2 is the right way to go, but we could have two versions: a typesafe one that is memcopy and an unsafe one that just edits the header of the array so that it becomes a Block.