BSON.jl
BSON.jl copied to clipboard
Cannot save array with undefined values
I am opening this issue so we can keep track of our progress in solving this bug.
Currently, BSON is unable to save an array that has one or more undefined values. Here is the minimum working example:
using BSON
x = Array{String, 1}(undef, 5)
x[1] = "a"
x[4] = "d"
println(x)
bson("test.bson", Dict(:x => x))
Expected behavior: BSON saves the data as requested.
Actual behavior: the following error is thrown:
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] lower at /Users/dilum/.julia/packages/BSON/XPZLD/src/extensions.jl:66 [inlined]
[2] _lower_recursive(::Array{String,1}, ::IdDict{Any,Any}, ::Array{Any,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
[3] (::getfield(BSON, Symbol("##7#11")){IdDict{Any,Any},Array{Any,1}})(::Array{String,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
[4] applychildren!(::getfield(BSON, Symbol("##7#11")){IdDict{Any,Any},Array{Any,1}}, ::Dict{Symbol,Any}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/BSON.jl:21
[5] _lower_recursive(::Dict{Symbol,Array{String,1}}, ::IdDict{Any,Any}, ::Array{Any,1}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:62
[6] lower_recursive(::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:73
[7] bson(::IOStream, ::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:81
[8] #14 at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:83 [inlined]
[9] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(BSON, Symbol("##14#15")){Dict{Symbol,Array{String,1}}}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:375
[10] open at ./iostream.jl:373 [inlined]
[11] bson(::String, ::Dict{Symbol,Array{String,1}}) at /Users/dilum/.julia/packages/BSON/XPZLD/src/write.jl:83
[12] top-level scope at none:0
It seems the culprit is the ... operator. More specifically the code
Any[x...]
on line ([1] lower at /Users/.../.julia/packages/BSON/XPZLD/src/extensions.jl:66)
This is where it tries to unpack the values of the array into a new array, but this doesn't work on arrays with '#undef' in them.
This behavior works with a different type of array though:
using BSON x = Array{Int, 1}(undef, 5) x[1] = 1 x[4] =3 println(x) bson("test.bson", Dict(:x => x))
BSON is able to save as expected because x gets initialized with zeros so the ... operator has something to unpack, where as Array{String, 1}(undef, 5) is initialized with #undef, so the ... operator breaks.
This seems like an even more basic issue and I see it as two solutions:
- Change how UndefInitializer() acts with arrays of type String to generate empty string, or something concrete. OR
- Change the functionality of the ... operator to be able to pass along undef somehow. (This option seems much more generalizable and the right thing to do.
- Change the functionality of the ... operator to be able to pass along undef somehow. (This option seems much more generalizable and the right thing to do.
@MikeInnes and I experimented with some code that bypassed the ...
splatting step. We still ran into an issue writing and reading the #undef
elements. So fixing the splatting won't solve this issue.
Change how UndefInitializer() acts with arrays of type String to generate empty string, or something concrete.
Certainly that would work, but would require a breaking change in the Julia language, and I'm not sure if there is support for that.
Does that mean someone just needs to find a way to read and write #undef
elements?
Is that the main fix that needs to happen with this issue?
Yep!
@DilumAluthge, So if this is fixed you and @MikeInnes can patch the splat operator issue? Because for this whole problem to be solved that will also need to be fixed.
Also have you worked on looking at the underlying issue of reading and writing #undef? I don't want to reinvent the wheel if you guys already have good leads.
@DilumAluthge, So if this is fixed you and @MikeInnes can patch the splat operator issue? Because for this whole problem to be solved that will also need to be fixed.
Yep! Take a look here at an example of how we can replace the splat operator with a function collect_any
. So that patch is already written.
Also have you worked on looking at the underlying issue of reading and writing #undef? I don't want to reinvent the wheel if you guys already have good leads.
You can take a look at Mike's work in https://github.com/MikeInnes/BSON.jl/pull/8. In PR #8, Mike wrote nothing
s in place of #undef
. Unfortunately, this ends up giving you the wrong eltype
. For example, you'd end up with an Array{Union{Nothing, String}}
instead of the desired Array{Nothing}
. So I don't think that writing nothing
s in place of #undef
is a good solution.
I think the best solution is going to be to write a unique sentinel in place of the #undef
values. The BSON specification actually reserves \x06
for "undefined" values, so I think we should use that. (Technically it's now deprecated, but as Mike pointed out in another thread, it's okay for us to deviate a little from the spec. We are not striving for 100% compatibility with other implementations of BSON.)
#47 Hopefully should take are of these issues. @MikeInnes
Bump. Was bitten by this