umka-lang icon indicating copy to clipboard operation
umka-lang copied to clipboard

Inserting into dynamic arrays isn't that fast, the array is copied on every insert

Open vtereshkov opened this issue 3 years ago • 6 comments

This isn't a big deal for small arrays, but for dynamic arrays with many elements, it's noticeable that every new insert is significantly slower than the one before. I'd much rather prefer a dynamic array like std::vector, as I honestly haven't yet found a use for copied dynamic arrays.

https://github.com/marekmaskarinec/tophat/issues/30#issuecomment-1158684630

vtereshkov avatar Jun 17 '22 10:06 vtereshkov

@skejeton Yes, that's a known problem. While Umka mimics Go syntactically, it has a different internal represenation of dynamic arrays. Instead of maintaining both length and capacity, it has only length. This prevents append() from making in-place modifications of the array. On the other hand, such a simplistic approach allows me to avoid the following (insane?) behavior: https://go.dev/play/p/N3VZraSGOeu

As a temporary workaround, I'd recommend filling a large dynamic array by indices instead of calling append() for each new item:

a := make([]T, 1000)
// ...
a[i] = T{/*...*/}
// ...
if i == len(a) - 1 {
    a = append(a, make([]T, len(a)))    // Resize
}

vtereshkov avatar Jun 19 '22 23:06 vtereshkov

Can't this technically be solved by checking if lhs and the first argument to append is the same memory location? Then unless the reference count is more than one, just append to it like you'd do in something like vector. I don't think there's a need to expose cap as an API like go did.

ske2004 avatar Aug 27 '22 08:08 ske2004

@skejeton I thought of this solution. However, the problem is that the LHS may not even exist:

return append(a, b)
c := append(a, b)
len(append(a, b))

or may be hard to define formally:

c = (((append(a, b))))
c, d = e, append(a, b)

This could be easier with an AST-based compiler. In Umka, this could only be done with an elaborate peephole optimization at the bytecode level. Perhaps someday I'll do this.

vtereshkov avatar Aug 27 '22 12:08 vtereshkov

@skejeton @marekmaskarinec As a compromise, I could suggest a new built-in function (say, push()) that acts like append(), but adds new items in-place, without copying the whole array to a new location.

vtereshkov avatar Sep 08 '22 21:09 vtereshkov

@skejeton @marekmaskarinec As a compromise, I could suggest a new built-in function (say, push()) that acts like append(), but adds new items in-place, without copying the whole array to a new location.

Sounds good to me. But it would also make sense to duplicate it for insert then. For example my buffer manipulation code inserts lines in the middle, although it wasn't much of a concern for now as it just stores an array of strings (effectively just a bunch of pointers). At least mutative alternative to append would be appreciated indeed.

For insert, place or put would be appropriate, however I think before that there needs to be the builtin name squatting problem solved because I have some methods named put.

ske2004 avatar Sep 10 '22 18:09 ske2004

Recent observations: resizeable dynamic arrays cannot store len directly in the DynArray structure. It seems that we have to store a pointer to len instead, so we'll need to rework DynArray and all its usages. Otherwise, we'll face a trouble similar to #169.

vtereshkov avatar Sep 17 '22 10:09 vtereshkov