julia
julia copied to clipboard
New `copystring` function
Since calling String
on a view
of an AbstractVector{UInt8}
is potential UB https://github.com/JuliaLang/julia/pull/54372#issuecomment-2102885169
I think it would be helpful to have a function.
copystring(v::AbstractVector{UInt8}) = take_string!(copyto!(StringMemory(length(v)), v)))
I could use this to fix ZipArchives.jl
by replacing current uses of the String
constructor on vectors of bytes with copystring
.
Would this be better in a separate CopyStrings.jl
package?
Here is an example where using the String
constructor on a Vector{UInt8}
can result in mutating a String
io = IOBuffer()
write(io, [0x61,0x62,0x63])
seekstart(io)
a = read(io, 3)
b = reshape(a, 1, 3)
c = reshape(b, 3)
s = String(a)
@show s
c[1] = 0x62
@show s
This changes s
from "abc"
to "bbc"
This happens on Julia 1.6 as well as nightly.
copystring
sounds like a string is being copied which is confusing.
Maybe bytes2string
is a better name?
Python calls this function bytes.decode
.
bytes2string
is clear, which is the most important factor imo.
2
names in Base: bytes2hex
, deg2rad
, hex2bytes
, hex2bytes!
, rad2deg
.
to
names in Base: htol
, hton
, ltoh
, ntoh
.
It should maybe take a Type{AbstractString}
argument so it can support Strs.jl etc.
There is already a transcode(::Type{String}, src)
function.
https://github.com/JuliaLang/julia/blob/d01d2561e69822667a8d809064109cd46fba22c1/base/strings/cstring.jl#L179
Maybe this can be changed to avoid mutating the input.
One solution could be to make take!
and read
on IOBuffer
not use Base.StringVector
, but allocate a normal vector. This will make using IOBuffer
for creating strings less efficient, but that is addressed by the new takestring!
function in https://github.com/JuliaLang/julia/pull/54372.
I'm willing to pitch a PR if that's an acceptable solution.
I think that is a good solution.
This will make using IOBuffer for creating strings less efficient,
That seems completely unacceptable? Isn't the whole reason not to make this slow pretty much the reason for the current behavior of String
?
@KristofferC the point is that with this change takestring!(iobuffer)
will be as efficient as String(take!(iobuffer))
is now, while String(take!(iobuffer))
will have a copy. The reason for this change is that a single takestring!
fully encapsulates the sketchy part where an immutable String
is created from mutable data without a copy, while our current idiom for this exposes the mutable object to the user which creates the possibility of the user accidentally holding on to a reference of the mutable data.
With this change in combination with https://github.com/JuliaLang/julia/pull/54372, existing users will see a regression that is fixed by replacing instances of String(take!(iobuffer))
with takestring!(iobuffer)
(we will likely make a Compat PR to make takestring!
available in older julia versions).
he point is that with this change takestring!(iobuffer) will be as efficient as String(take!(iobuffer)) is now, while String(take!(iobuffer)) will have a copy.
The whole reason we didn't do this X years ago was that adding this type of perf regression to code run everywhere was not acceptable. I don't see why it would be now.
This is the comment I was thinking about: https://github.com/JuliaLang/julia/issues/24388#issuecomment-341206183
Those who don't know history are destined to repeat it.
Reading through https://github.com/JuliaLang/julia/issues/24388, https://github.com/JuliaLang/julia/pull/25241, https://github.com/JuliaLang/julia/pull/25846, https://github.com/JuliaLang/julia/pull/26093 is probably a good idea for people interested in this to know a bit about the discussing and history that has lead to the current behavior.
String(copy(v))
still makes two copies as noted in https://github.com/JuliaLang/julia/pull/26093#issuecomment-366809258 In ZipArchives.jl I would like to avoid the extra copy. This is currently possible with unsafe_string
, so an extra function in Base is not needed, but IMO it would be nice to avoid having to deal with pointers.
Fixed by #54927