Improve implementation of Vector.tabulate

Open myreen opened this issue 8 years ago • 1 comments

Currently, Vector.tabluate is a straightforward translation of the following function from mlvectorScript.sml:

val tabulate_def = Define`
  tabulate n f = Vector (GENLIST f n)`;

For large n, this is a bad implementation since a list of size n requires 3 * n words of space on the heap. Furthermore, the final vector requires n words of space, and when the vector is created in data_to_word both the list and the space for the new vector must be present in the heap simultaneously. Thus the space required to create a vector of length n is a total of 4 * n words of memory during creation. (Once the vector has been created the intermediate list becomes garbage which will eventually be removed by the GC.)

With the current implementation of the compiler, where vectors are represented as Block in BVL and the only operation to produce such in data_to_wordis FromList, this space inefficiency is unavoidable.

Option 1

I propose that each vector is implemented in BVL as a ValueArray instead of a Block. The idea is that the source would have a new primitive, VectorFromArray with which Vector.tabluate would be implemented, i.e. first producing an array with the desired content, then apply the primitive to this array in order to turn it into vector. By doing so, Vector.tabluate could perform allocation with a cost of 2 * n words of memory, instead of 4 * n words of memory. This wouldn't be much of a speed boost, but it would reduce this momentary peak in space consumption by half.

One benefit of this scheme is that the slightly involved verification of FromList in data_to_word_assignProof could be replaced by a higher-level and easier-to-maintain verification proofs of FromList and FromArray stub code in pat_to_closProof.

The downside of this scheme is that all vectors (of which there aren't many) are treated less efficiently by the generational garbage collector, because it treats them as potentially mutable.

An even more space efficient (but much more challenging) approach would be to insert a VecTabluate primitive to source and compile it only in pat_to_clos. This avoids the intermediate array and can thus consume only n words of memory at creation. However, this requires supporting a primitive that calls evaluate internally for the function f and it requires changing pat_to_clos to have an existential in the proof goal (changes the current nice shape of the proof completely). I do not recommend this option.

Option 2

I propose that we keep representing vectors as Blocks in BVL, and instead (1) add a new primitive FromArray to every intermediate language down to and including dataLang, and (2) compile this in data_to_word to a call to a memcopy stub.

Recommendation

I prefer option 2 because it has the same peak space consumption as option 1 but is better in that it allows the GC to treat vectors as immutable data.

This was motivated by problems with space consumption that @tanyongkiam found when benchmarking the compiler.

Dec 21 '17 06:12 myreen

See comment by @myreen https://hol.zulipchat.com/#narrow/channel/518311-CakeML-issue-review/topic/Improve.20implementation.20of.20Vector.2Etabulate.20.23411/near/531273893

This should be open. I'm still of the opinion that option 2 is the right way to go, but we could have two versions: a typesafe one that is memcopy and an unsafe one that just edits the header of the array so that it becomes a Block.

Jul 29 '25 04:07 ordinarymath