julia
julia copied to clipboard
don't reset array length and size fields on array_to_string
Should fix https://github.com/JuliaLang/julia/issues/54275.
I'm a bit confused, because the array you get is empty
julia> a = rand(UInt8, 5)
5-element Vector{UInt8}:
0xda
0x73
0x44
0x3e
0xb5
julia> str = String(a)
"\xdasD>\xb5"
julia> a
UInt8[]
So i'm a bit suspicious of how this is implemented. This is moving
the memory from the Array to the string, so freeing the string is what should do the change.
Yes, that's one source of confusion for me as well. As it is, it seems like jl_array_to_string
is calling jl_pchar_to_string
which does copy the array into the allocated string.
For some reason, we're still resetting the indices of the Array{UInt8}
, even though its buffer has not been moved and is still allocated in the Array{UInt8}
.
I see two paths forward:
- Don't reset the indices of the
Array{UInt8}
to ensure they actually reflect the fact that the buffer has been copied, and not moved (this PR). - Don't allocate a new buffer to the
String
and actually move the ownership of the buffer of theArray{UInt8}
to theString
.
Three possible paths forward, actually, the third one being what you suggested: implement a function to free the buffer which has been moved and make sure it's correctly accounted in GC live bytes.
I believe the issue is that ->maxsize shouldn't be reset. The others should be reset.
I believe the issue is that ->maxsize shouldn't be reset
Yes, this does solve https://github.com/JuliaLang/julia/issues/54275.
Still, it's a bit odd that constructing a String
from an Array{UInt8}
takes ownership of the corresponding buffer and resets the array. Copying the buffer and keeping the array intact seems like the less surprising/most intuitive behavior here.
If this is breaking, then yes, let's just solve the accounting bug and keep the observable behavior of String(::Array{UInt8})
as it is.
I think the current behavior of taking ownership is desired for performance. If you have a 1gb Vector{Uint8} having to copy all that data to get a string will be pretty slow (and double your memory requirement).
If you have a 1gb Vector{Uint8} having to copy all that data to get a string will be pretty slow (and double your memory requirement).
If I'm not misunderstanding something, that's precisely what we do in the C Array implementation? Link.
Right, so if you don't copy the memory, the reason we set maxsize to 0 is to prevent the original array from mutating the string afterward.
Right, so if you don't copy the memory, the reason we set maxsize to 0 is to prevent the original array from mutating the string afterward.
I meant to say that at least in String(::Array{UInt8})
we do copy the underlying buffer.
In that context, I really have no idea why we're doing this (but I want to make sure @vtjnash gets a chance to weigh in here)
This case is only if you built the array weirdly for constructing a string (eg with pushfirst). If we have to copy the data anyways to shift it, we might as well allocate a new one of the exact right size too
In that case, to avoid changing the behavior of resetting the array, I just added a call (jl_gc_count_freed
) to track the number of freed bytes when we copy such buffer and reset the array.
Does it seem more reasonable to you folks?
I think your previous change (only maxsize) is more correct here. Because the buffer hasn't been freed yet, we've just resized the array, the memory is still there.
@gbaraldi that fix isn't correct because with it, a push!
to the array will expand back into the String
's memory which would corrupt it.
This is moving the memory from the Array to the string, so freeing the string is what should do the change
That is not quite accurate. It is not moving memory, as that memory was always owned by the String, it is simply deleting the reference to that String that used to be part of the Array.
So we are talking about the case at the bottom where we just make a copy.
Bump.
Can you also make the master version of this PR?