ponyc
ponyc copied to clipboard
Memory corruption via Array.copy_to
There's bugs in Array's copy to where it doesn't validate input
for example:
actor Main
new create(e: Env) =>
var src: Array[U8] = [1]
var dest: Array[U8] = [11; 1; 2; 3; 4; 5; 6]
try
e.out.print(dest(0)?.string())
end
src.copy_to(dest, 11, 0, 10)
try
e.out.print(dest(0)?.string())
end
prints 11 and then prints 0. copying from an out of index src_index
shouldn't be changing the destination array, it should be an error.
To fix, copy_to needs to become partial.
As part of this change, the slice
method that use copy_to
be switched to using _copy_to
so that it isn't required to become partial UNLESS, slice actually needs to be partial. We should validate that it doesn't.
Some things that we should be adding copy_to
tests methods for:
- src index is outside of range (should be a bug now)
- dest index is outside of valid range (should be handled correctly now)
- length to copy goes outside of valid range (should be a bug now)
- array was never initialized (so _ptr is null) (see example-ish test in #4173)
- array was initialized but is now empty (_ptr isn't null, but size is 0) (see example-ish test in #4173)
Here's some example code that shows the length bug:
actor Main
new create(e: Env) =>
var src: Array[U8] = [1]
var dest: Array[U8] = [11; 1; 2; 3; 4; 5; 6]
try
e.out.print(dest(3)?.string())
end
src.copy_to(dest, 0, 0, 10)
try
e.out.print(dest(3)?.string())
end
I don't think silently not doing this is a good API here. Most of Array if you overshoot an index are partial...
see
- read_U8
- read_U16
- etc for read's
- update_U8
- etc for update's
- insert
- delete
Note, it appears that copy_from
has similar issues.
I don't think silently not doing this is a good API here. Most of Array if you overshoot an index are partial...
see
- read_U8
- read_U16
- etc for read's
- update_U8
- etc for update's
- insert
- delete
This is exactly how for example remove
operates. If you ask for too many to be removed it silently only removes as many as it can. I am not sure that I would agree that doing it so would be bad.
The way I propose to fix #4172 and this here is consistent with the way this is handled in remove
( see let count = n.min(_size - i)
)
Looks like this has been a bug since pretty much the very beginning (8 years ago).
I would take the position that silently truncating the range is the most consistent thing to do here, rather than making the method partial.
The general precedent in the String and Array methods seems to be:
- If dealing with a single element index that is out of range, raise an error
- If dealing with a span of element indexes, part or all of which are out of range, silently truncate the span to be in range (possibly making it empty, if all of it was out of range)
If you were going to take the position that the latter class of cases should also be partial, you're going to need to make a lot more of the methods partial than just copy_to
and copy_from
.
Re: "dest index is outside of valid range (should be handled correctly now)"
It's worth noting that by requesting a destination index out of bounds, in a way that leaves "gaps" in between, we might also encounter memory corruption. The code below will segfault in release and debug modes:
class Counter
var value: U32 = 0
new create(v': U32) =>
value = v'
actor Main
new create(e: Env) =>
let src = [Counter(0)]
let dest = [Counter(1)]
src.copy_to(dest, 0, 2, 1) // past the range
try
e.out.print(dest(0)?.value.string()) // valid memory
e.out.print(dest(1)?.value.string()) // invalid memory
end
Yeah, we definitely don't want to leave gap in between in whatever the desired behavior for this method is.
We discussed this a lot on the sync call and my basic thought is:
I don't mind removing copy_to
and/or copy_from
, and I also don't mind making them partial, as long as we also provide a new method (e.g. something like append_array
) because I think the dst_idx
is where the main need for partial-ness comes from, and there's no example I could find on GitHub out in the wild that needed to copy to any destination index besides just the end of the destination buffer. These use cases can't currently use append
because it's non-optimal due to not requiring the source to be contiguous memory. If we add a new method like append
that uses a contiguous source buffer, I think it serves all current use cases of copy_to
and copy_from
in the wild.
I updated the fix in #4173 to catch a destination index that goes past dest._size - 1
.
@jemc I too don't mind removing copy_to
and copy_from
(especially copy_from
which is a bad mirror of copy_to
that is restricted to U8
?? and which can also be expressed by an equivalent copy_to
call) so as long as a low overhead function exists that allows me to copy an entire array into a target array. Being required to call clear
on the destination before calling such a new append
is still low overhead.
However, it is also worth noting that both, making the existing functions partial or removing them altogether would break a lot of my modules.
After discussing this topic with Sean today, here's a path forward that at least he and I are in agreement on (others here can chime in as well).
This mostly matches what @ergl and I discussed in the sync call 10 days ago, so to hear more about the rationale, listen to that recording.
Make the copy_to
method partial, but only one case would result in an error being raised:
- When some part of the source slice is out of range, the number of elements copied is silently truncated so that it only copies in-range bytes from the source.
- Instead of returning
None
, returnUSize
indicating the actual number of elements that were copied (after the aforementioned truncation) - When the destination slice is fully within range, the copied elements replace the elements currently in that destination slice
- When the destination index is equal to the current size of the destination array, the copied elements are appended, increasing the array's size by the copied amount
- When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are appended, increasing the destination size
- When the destination index is beyond the range, such that if we did expand the size and do the copy there would be a gap of "undefined" elements before the copied elements, then
copy_to
will raise an error (this is the only case that makes it partial)
Add a new copy_append
method which does not allow specifying destination index, and thus would be a non-partial alternative.
Make copy_from
partial, remove the artificial U8
restriction, and implement it in terms of copy_to
so that the semantics are the same as copy_to
(but mirrored)
Add a new copy_from_append
method, which mirrors copy_append
in the same way.
This sounds like a good way forward -- I closed #4173.
@stefandd would you be interested in implementing the above? You already did a decent amount of work towards the tests and implementation.
@jemc @SeanTAllen I am happy to, for a start, work on the new copy_to
and copy_from
functions. The final version of copy_to
I had in my PR met the first 5 out of 6 description points above already.
Sounds great, if you're willing to continue on that path. Thanks!
@jemc
Add a new
copy_append
method which does not allow specifying destination index, and thus would be a non-partial alternative.
Does copy_append
allow you to specify a source index and a length? I guess so, but I wanted to be sure.
If so, and with the addition of copy_append
, I feel like keeping the "appending" cases in copy_to
/ copy_from
is a bit redundant.
If we're making a breaking change, I'd prefer to specify that any out of range scenario in copy_to
/ copy_from
results in truncation, and leave any appending operations to the new methods being introduced. That way, we avoid marking copy_to
partial. For example:
When the destination index is equal to the current size of the destination array, the copied elements are appended, increasing the array's size by the copied amount
When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are appended, increasing the destination size
When the destination index is beyond the range, such that if we did expand the size and do the copy there would be a gap of "undefined" elements before the copied elements, then
copy_to
will raise an error (this is the only case that makes it partial)
These would be modified to the following:
-
When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are truncated.
-
When the destination index is equal to the current size of the destination array, or the index is beyond the range, nothing is copied, and the return value is zero.
That's an interesting point, and I definitely like the reduced conceptual complexity of that approach (especially that it avoids the mental overhead of remembering which cases truncate and which cases raise an error), though I think it has two downsides to mention:
-
There wouldn't be any method that serves the use case of copying into a place "near" the end of the destination, using some elements to overwrite, and the rest of the elements being appended to extend the destination. There would be no way to do this in just one copy - you'd have to do first one
copy_to
for the overwrite part and then onecopy_append
for the following append part. -
This would become a silently breaking change rather than a "fail at compile time" breaking change. That is, in the approach I outlined above, we'd be making
copy_to
andcopy_from
partial, which would force everyone to either handle the error case or, more likely, switch tocopy_append
andcopy_from_append
(because all the real-world cases I know of are doing the append case). In contrast, the approach you outlined would not force any compile-time failures, and would instead just start failing at runtime for everyone who didn't pay attention to the release notes and update their program.
If we want to take your approach then we may want to consider at least changing the function names for force a compile-time failure so that we don't make a silently breaking behavioral change to an existing function.
@ergl @jemc
Please indicate if/when a consensus has been reached among the core developers on how to proceed..
@stefandd No consensus as of yet afaik. I won't be able to give this the time that it needs until next month.
@jemc let's you and i discuss this issue and come to a resolution so we can unblock this and hopefully @stefandd is still interested in fixing it.
@SeanTAllen I am happy to take a stab at implementing a proposed change to copy_to
and copy_from
We discussed this in sync today. @SeanTAllen and I still want to go with the partial approach that I outlined in this comment: https://github.com/ponylang/ponyc/issues/4174#issuecomment-1228815465
@ergl - what do you think? Do you think you can align with this same consensus? Or do you still prefer the alternative you outlined in this comment: https://github.com/ponylang/ponyc/issues/4174#issuecomment-1229906875
@jemc Your proposal sounds good. I came round on your idea: your point about having a silent breaking change was a good one.
I believe we have a quorum of agreement now. @stefandd are you still up for working on this?