ply icon indicating copy to clipboard operation
ply copied to clipboard

Can we safely optimize reassignments?

Open lukechampine opened this issue 8 years ago • 3 comments

We already optimize for most cases when the result of a ply function/method is reassigned to an existing memory:

xs := make([]int, 50, 100)
xs = []int{1,2,3}.morph(func(i int) int { return i * i })

In the above code, the morphed values are appended to xs[:0], so len(xs) is 3 and cap(xs) is still 100. (EDIT: this is no longer the case.)

In theory, we can optimize more aggressively when we can prove that the data is not used after the function/method is called on it. A trivial case of this is when calling methods on a slice literal: clearly, later code cannot reference the literal, so we are free to modify it in-place. Of course, things are much trickier when the data is already referenced by a variable somewhere, like so:

func foo(xs []int) int {
    ys := xs.reverse()
    return ys[0]
}

Even though we can prove that foo doesn't modify xs, we can't easily prove that the memory referenced by xs isn't being used elsewhere.

We could detect reassignments to the exact same variable, as in:

xs = xs.reverse()

It should be safe to reverse xs in-place, right?

Well, it depends on what you think is the most intuitive. Remember, if another slice has a pointer to the same data as xs, then it will be modified too. And that could be quite surprising! On the other hand, we would really like to avoid needless allocations. So which should we prioritize? Or is there a way to achieve both?

Here are three solutions that come to mind. Both are flawed:

  • Optimize by default. If you need to disable the optimization, assign to a new variable.

This is bound to lead to at least one tortuously long debugging session. Besides, it's just ugly, especially if you really do want to reassign the new memory to the existing variable. Then you need to assign to a temp var and back to the original -- gross.

  • Allocate by default. If you want to optimize, write xs.reverse() as a statement instead of xs = xs.reverse().

This only works when the return value is the same type of xs. Furthermore, what if the method has to change the length or capacity of xs? There's no way to do that unless you have a pointer to the slice, which we don't. And even if we did, it would be surprising behavior.

  • Have the caller explicitly pass the memory to reuse as an additional argument.

It's tempting to simply punt the issue to the programmer, but it's also not very ergonomic. The whole raison d'etre of Ply is to reduce the amount of code you need to write, and this solution is clearly counter to that goal. On the other hand, this scenario may be much less common than I'm making it out to be. Perhaps I should give it some time to percolate and see how big of an issue this really is.

lukechampine avatar Dec 31 '16 09:12 lukechampine

~Three~ Four more potential solutions:

  • Provide separate functions that can reuse memory, e.g. rfilter, rmorph, etc.

This is easiest to implement on the backend, but I don't like the idea of having three versions of each function -- I'm already planning on adding parallel versions, e.g. pfilter and pmap. Actually, you'd need four, since you might want to reuse memory and run in parallel. Blegh.

  • Indicate reassignment via a compile directive (e.g. xs.filter(even) // ply: reuse xs).

No.

  • Optimize the use of copy and append (e.g. append(xs[:0], xs.filter(even))).

This is tempting, but it still doesn't match the expectations of the average Go programmer. They will be expecting xs.filter to allocate and return new memory. Also, this doesn't work for maps.

  • Use a new token for reassignments, or overload an old one (e.g. xs == xs.filter(even)).

No.

It's worth noting that all of these require the programmer to specify the optimization, whereas the current approach "just works." My guess is that 95% of the time, when you write xs = xs.filter(even), you want to reuse xs's memory. But I'm just not comfortable making that decision for the programmer 100% of the time.

I'm leaning towards the first approach mentioned in this comment (r* variants), but it's still too early to make a decision. I'm going to disable the reassignment optimization for now and wait for more ideas or feedback.

lukechampine avatar Jan 01 '17 21:01 lukechampine

Another potential solution: immutable slices.

If we introduce an immutable builtin (or keyword), then this optimization may simply "fall out" as a consequence. For example, this code would always allocate a new slice:

xs := immutable([]int{1,2,3})
xs = xs.filter(even)

Whereas without the immutable cast, xs would always be reused.

This is still a bit surprising though. It might be better to instead make slices immutable by default. Then the programmer can optimize as needed by using a mut builtin:

xs := mut([]int{1,2,3})
xs = xs.filter(even)

Of course, adding such a builtin would likely be a very large undertaking, especially if we go the latter route. See this proposal.

lukechampine avatar Feb 22 '17 22:02 lukechampine

Another potential solution: a dup function. Like so:

xs = xs.filter(even) // reuses memory

xs = dup(xs).filter(even) // doesn't

I like this because it makes the cost explicit. On the other hand, don't we want to encourage immutable values? Maybe Go isn't the right language to push that paradigm in. Another downside to this approach is that you can't write the values directly into previously-allocated memory. Perhaps the most practical thing to do here is to simply detect uses of append. For example, we could optimize this call to write directly into ys:

ys = append(ys[:0], xs.filter(even)...)

But as mentioned in a previous comment, this doesn't apply to maps.

lukechampine avatar Feb 28 '17 23:02 lukechampine