ponyc icon indicating copy to clipboard operation
ponyc copied to clipboard

Memory corruption via Array.copy_to

Open SeanTAllen opened this issue 1 year ago • 16 comments

There's bugs in Array's copy to where it doesn't validate input

for example:

actor Main
  new create(e: Env) =>
    var src: Array[U8] = [1]
    var dest: Array[U8] = [11; 1; 2; 3; 4; 5; 6]
    try
      e.out.print(dest(0)?.string())
    end
    src.copy_to(dest, 11, 0, 10)
    try
      e.out.print(dest(0)?.string())
    end

prints 11 and then prints 0. copying from an out of index src_index shouldn't be changing the destination array, it should be an error.

To fix, copy_to needs to become partial.

As part of this change, the slice method that use copy_to be switched to using _copy_to so that it isn't required to become partial UNLESS, slice actually needs to be partial. We should validate that it doesn't.

Some things that we should be adding copy_to tests methods for:

  • src index is outside of range (should be a bug now)
  • dest index is outside of valid range (should be handled correctly now)
  • length to copy goes outside of valid range (should be a bug now)
  • array was never initialized (so _ptr is null) (see example-ish test in #4173)
  • array was initialized but is now empty (_ptr isn't null, but size is 0) (see example-ish test in #4173)

Here's some example code that shows the length bug:

actor Main
  new create(e: Env) =>
    var src: Array[U8] = [1]
    var dest: Array[U8] = [11; 1; 2; 3; 4; 5; 6]
    try
      e.out.print(dest(3)?.string())
    end
    src.copy_to(dest, 0, 0, 10)
    try
      e.out.print(dest(3)?.string())
    end

SeanTAllen avatar Aug 07 '22 14:08 SeanTAllen

I don't think silently not doing this is a good API here. Most of Array if you overshoot an index are partial...

see

  • read_U8
  • read_U16
  • etc for read's
  • update_U8
  • etc for update's
  • insert
  • delete

SeanTAllen avatar Aug 07 '22 16:08 SeanTAllen

Note, it appears that copy_from has similar issues.

SeanTAllen avatar Aug 07 '22 16:08 SeanTAllen

I don't think silently not doing this is a good API here. Most of Array if you overshoot an index are partial...

see

  • read_U8
  • read_U16
  • etc for read's
  • update_U8
  • etc for update's
  • insert
  • delete

This is exactly how for example remove operates. If you ask for too many to be removed it silently only removes as many as it can. I am not sure that I would agree that doing it so would be bad. The way I propose to fix #4172 and this here is consistent with the way this is handled in remove ( see let count = n.min(_size - i))

stefandd avatar Aug 07 '22 16:08 stefandd

Looks like this has been a bug since pretty much the very beginning (8 years ago).

I would take the position that silently truncating the range is the most consistent thing to do here, rather than making the method partial.

The general precedent in the String and Array methods seems to be:

  • If dealing with a single element index that is out of range, raise an error
  • If dealing with a span of element indexes, part or all of which are out of range, silently truncate the span to be in range (possibly making it empty, if all of it was out of range)

If you were going to take the position that the latter class of cases should also be partial, you're going to need to make a lot more of the methods partial than just copy_to and copy_from.

jemc avatar Aug 09 '22 12:08 jemc

Re: "dest index is outside of valid range (should be handled correctly now)"

It's worth noting that by requesting a destination index out of bounds, in a way that leaves "gaps" in between, we might also encounter memory corruption. The code below will segfault in release and debug modes:

class Counter
  var value: U32 = 0
  new create(v': U32) =>
    value = v'

actor Main
  new create(e: Env) =>
    let src = [Counter(0)]
    let dest = [Counter(1)]
    src.copy_to(dest, 0, 2, 1) // past the range
    try
      e.out.print(dest(0)?.value.string()) // valid memory
      e.out.print(dest(1)?.value.string()) // invalid memory
    end

ergl avatar Aug 16 '22 18:08 ergl

Yeah, we definitely don't want to leave gap in between in whatever the desired behavior for this method is.

We discussed this a lot on the sync call and my basic thought is:

I don't mind removing copy_to and/or copy_from, and I also don't mind making them partial, as long as we also provide a new method (e.g. something like append_array) because I think the dst_idx is where the main need for partial-ness comes from, and there's no example I could find on GitHub out in the wild that needed to copy to any destination index besides just the end of the destination buffer. These use cases can't currently use append because it's non-optimal due to not requiring the source to be contiguous memory. If we add a new method like append that uses a contiguous source buffer, I think it serves all current use cases of copy_to and copy_from in the wild.

jemc avatar Aug 16 '22 18:08 jemc

I updated the fix in #4173 to catch a destination index that goes past dest._size - 1.

@jemc I too don't mind removing copy_to and copy_from (especially copy_from which is a bad mirror of copy_to that is restricted to U8?? and which can also be expressed by an equivalent copy_to call) so as long as a low overhead function exists that allows me to copy an entire array into a target array. Being required to call clear on the destination before calling such a new append is still low overhead. However, it is also worth noting that both, making the existing functions partial or removing them altogether would break a lot of my modules.

stefandd avatar Aug 17 '22 15:08 stefandd

After discussing this topic with Sean today, here's a path forward that at least he and I are in agreement on (others here can chime in as well).

This mostly matches what @ergl and I discussed in the sync call 10 days ago, so to hear more about the rationale, listen to that recording.

Make the copy_to method partial, but only one case would result in an error being raised:

  • When some part of the source slice is out of range, the number of elements copied is silently truncated so that it only copies in-range bytes from the source.
  • Instead of returning None, return USize indicating the actual number of elements that were copied (after the aforementioned truncation)
  • When the destination slice is fully within range, the copied elements replace the elements currently in that destination slice
  • When the destination index is equal to the current size of the destination array, the copied elements are appended, increasing the array's size by the copied amount
  • When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are appended, increasing the destination size
  • When the destination index is beyond the range, such that if we did expand the size and do the copy there would be a gap of "undefined" elements before the copied elements, then copy_to will raise an error (this is the only case that makes it partial)

Add a new copy_append method which does not allow specifying destination index, and thus would be a non-partial alternative.

Make copy_from partial, remove the artificial U8 restriction, and implement it in terms of copy_to so that the semantics are the same as copy_to (but mirrored)

Add a new copy_from_append method, which mirrors copy_append in the same way.

jemc avatar Aug 26 '22 18:08 jemc

This sounds like a good way forward -- I closed #4173.

stefandd avatar Aug 26 '22 20:08 stefandd

@stefandd would you be interested in implementing the above? You already did a decent amount of work towards the tests and implementation.

SeanTAllen avatar Aug 26 '22 23:08 SeanTAllen

@jemc @SeanTAllen I am happy to, for a start, work on the new copy_to and copy_from functions. The final version of copy_to I had in my PR met the first 5 out of 6 description points above already.

stefandd avatar Aug 27 '22 06:08 stefandd

Sounds great, if you're willing to continue on that path. Thanks!

jemc avatar Aug 28 '22 02:08 jemc

@jemc

Add a new copy_append method which does not allow specifying destination index, and thus would be a non-partial alternative.

Does copy_append allow you to specify a source index and a length? I guess so, but I wanted to be sure.

If so, and with the addition of copy_append, I feel like keeping the "appending" cases in copy_to / copy_from is a bit redundant.

If we're making a breaking change, I'd prefer to specify that any out of range scenario in copy_to / copy_from results in truncation, and leave any appending operations to the new methods being introduced. That way, we avoid marking copy_to partial. For example:

  • When the destination index is equal to the current size of the destination array, the copied elements are appended, increasing the array's size by the copied amount

  • When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are appended, increasing the destination size

  • When the destination index is beyond the range, such that if we did expand the size and do the copy there would be a gap of "undefined" elements before the copied elements, then copy_to will raise an error (this is the only case that makes it partial)

These would be modified to the following:

  • When the destination slice is only partly within range, the in-range part of the copied elements replace existing elements and the rest are truncated.

  • When the destination index is equal to the current size of the destination array, or the index is beyond the range, nothing is copied, and the return value is zero.

ergl avatar Aug 29 '22 07:08 ergl

That's an interesting point, and I definitely like the reduced conceptual complexity of that approach (especially that it avoids the mental overhead of remembering which cases truncate and which cases raise an error), though I think it has two downsides to mention:

  1. There wouldn't be any method that serves the use case of copying into a place "near" the end of the destination, using some elements to overwrite, and the rest of the elements being appended to extend the destination. There would be no way to do this in just one copy - you'd have to do first one copy_to for the overwrite part and then one copy_append for the following append part.

  2. This would become a silently breaking change rather than a "fail at compile time" breaking change. That is, in the approach I outlined above, we'd be making copy_to and copy_from partial, which would force everyone to either handle the error case or, more likely, switch to copy_append and copy_from_append (because all the real-world cases I know of are doing the append case). In contrast, the approach you outlined would not force any compile-time failures, and would instead just start failing at runtime for everyone who didn't pay attention to the release notes and update their program.

If we want to take your approach then we may want to consider at least changing the function names for force a compile-time failure so that we don't make a silently breaking behavioral change to an existing function.

jemc avatar Aug 31 '22 01:08 jemc

@ergl @jemc

Please indicate if/when a consensus has been reached among the core developers on how to proceed..

stefandd avatar Sep 20 '22 10:09 stefandd

@stefandd No consensus as of yet afaik. I won't be able to give this the time that it needs until next month.

ergl avatar Sep 20 '22 15:09 ergl

@jemc let's you and i discuss this issue and come to a resolution so we can unblock this and hopefully @stefandd is still interested in fixing it.

SeanTAllen avatar Oct 18 '22 18:10 SeanTAllen

@SeanTAllen I am happy to take a stab at implementing a proposed change to copy_to and copy_from

stefandd avatar Oct 19 '22 20:10 stefandd

We discussed this in sync today. @SeanTAllen and I still want to go with the partial approach that I outlined in this comment: https://github.com/ponylang/ponyc/issues/4174#issuecomment-1228815465

@ergl - what do you think? Do you think you can align with this same consensus? Or do you still prefer the alternative you outlined in this comment: https://github.com/ponylang/ponyc/issues/4174#issuecomment-1229906875

jemc avatar Oct 25 '22 18:10 jemc

@jemc Your proposal sounds good. I came round on your idea: your point about having a silent breaking change was a good one.

ergl avatar Nov 05 '22 17:11 ergl

I believe we have a quorum of agreement now. @stefandd are you still up for working on this?

SeanTAllen avatar Nov 15 '22 19:11 SeanTAllen