julia icon indicating copy to clipboard operation
julia copied to clipboard

don't reset array length and size fields on array_to_string

Open d-netto opened this issue 9 months ago • 10 comments

Should fix https://github.com/JuliaLang/julia/issues/54275.

d-netto avatar Apr 29 '24 14:04 d-netto

I'm a bit confused, because the array you get is empty

julia> a = rand(UInt8, 5)
5-element Vector{UInt8}:
 0xda
 0x73
 0x44
 0x3e
 0xb5

julia> str = String(a)
"\xdasD>\xb5"

julia> a
UInt8[]

So i'm a bit suspicious of how this is implemented. This is moving the memory from the Array to the string, so freeing the string is what should do the change.

gbaraldi avatar Apr 29 '24 15:04 gbaraldi

Yes, that's one source of confusion for me as well. As it is, it seems like jl_array_to_string is calling jl_pchar_to_string which does copy the array into the allocated string.

For some reason, we're still resetting the indices of the Array{UInt8}, even though its buffer has not been moved and is still allocated in the Array{UInt8}.

I see two paths forward:

  • Don't reset the indices of the Array{UInt8} to ensure they actually reflect the fact that the buffer has been copied, and not moved (this PR).
  • Don't allocate a new buffer to the String and actually move the ownership of the buffer of the Array{UInt8} to the String.

d-netto avatar Apr 29 '24 15:04 d-netto

Three possible paths forward, actually, the third one being what you suggested: implement a function to free the buffer which has been moved and make sure it's correctly accounted in GC live bytes.

d-netto avatar Apr 29 '24 15:04 d-netto

I believe the issue is that ->maxsize shouldn't be reset. The others should be reset.

gbaraldi avatar Apr 29 '24 15:04 gbaraldi

I believe the issue is that ->maxsize shouldn't be reset

Yes, this does solve https://github.com/JuliaLang/julia/issues/54275.

Still, it's a bit odd that constructing a String from an Array{UInt8} takes ownership of the corresponding buffer and resets the array. Copying the buffer and keeping the array intact seems like the less surprising/most intuitive behavior here.

If this is breaking, then yes, let's just solve the accounting bug and keep the observable behavior of String(::Array{UInt8}) as it is.

d-netto avatar Apr 29 '24 15:04 d-netto

I think the current behavior of taking ownership is desired for performance. If you have a 1gb Vector{Uint8} having to copy all that data to get a string will be pretty slow (and double your memory requirement).

oscardssmith avatar Apr 29 '24 16:04 oscardssmith

If you have a 1gb Vector{Uint8} having to copy all that data to get a string will be pretty slow (and double your memory requirement).

If I'm not misunderstanding something, that's precisely what we do in the C Array implementation? Link.

d-netto avatar Apr 29 '24 16:04 d-netto

Right, so if you don't copy the memory, the reason we set maxsize to 0 is to prevent the original array from mutating the string afterward.

oscardssmith avatar Apr 29 '24 16:04 oscardssmith

Right, so if you don't copy the memory, the reason we set maxsize to 0 is to prevent the original array from mutating the string afterward.

I meant to say that at least in String(::Array{UInt8}) we do copy the underlying buffer.

d-netto avatar Apr 29 '24 16:04 d-netto

In that context, I really have no idea why we're doing this (but I want to make sure @vtjnash gets a chance to weigh in here)

oscardssmith avatar Apr 29 '24 16:04 oscardssmith

This case is only if you built the array weirdly for constructing a string (eg with pushfirst). If we have to copy the data anyways to shift it, we might as well allocate a new one of the exact right size too

vtjnash avatar Apr 29 '24 18:04 vtjnash

In that case, to avoid changing the behavior of resetting the array, I just added a call (jl_gc_count_freed ) to track the number of freed bytes when we copy such buffer and reset the array.

Does it seem more reasonable to you folks?

d-netto avatar Apr 29 '24 18:04 d-netto

I think your previous change (only maxsize) is more correct here. Because the buffer hasn't been freed yet, we've just resized the array, the memory is still there.

gbaraldi avatar Apr 29 '24 19:04 gbaraldi

@gbaraldi that fix isn't correct because with it, a push! to the array will expand back into the String's memory which would corrupt it.

oscardssmith avatar Apr 29 '24 21:04 oscardssmith

This is moving the memory from the Array to the string, so freeing the string is what should do the change

That is not quite accurate. It is not moving memory, as that memory was always owned by the String, it is simply deleting the reference to that String that used to be part of the Array.

vtjnash avatar Apr 29 '24 21:04 vtjnash

So we are talking about the case at the bottom where we just make a copy.

gbaraldi avatar Apr 29 '24 22:04 gbaraldi

Bump.

d-netto avatar May 01 '24 17:05 d-netto

Can you also make the master version of this PR?

oscardssmith avatar May 01 '24 18:05 oscardssmith