JLD2.jl
JLD2.jl copied to clipboard
Explicit Type Remapping & Anonymous Functions
This PR finally implements what is needed to store anonymous functions using JLD2.
Most of the julia side of things is borrowed from BSON
but additional trickery
was needed to integrate all this with JLD2
.
AFAICT the memory layout of functions / typenames / methods have changed from julia 1.5 to 1.6 and this PR only supports 1.6.
As a side effect of this, this PR also implements explicit type remapping to allow renaming types on load. This can be useful when working with multiple versions of the same struct. (e.g. old one in the file)
Explicit Type Remapping
Sometimes you store data using struct
s that you defined yourself or are
shipped with some package and weeks later, when you want to
load the data, the structs have changed.
using JLD2
struct A
x::Int
end
jldsave("example.jld2"; a = A(42))
This results in warnings and sometimes even errors when trying to load the file as demonstrated here.
julia> using JLD2
julia> struct A{T}
x::T
end
julia> load("example.jld2")
┌ Warning: read type A is not a leaf type in workspace; reconstructing
└ @ JLD2 ~/.julia/dev/JLD2/src/data/reconstructing_datatypes.jl:273
Dict{String, Any} with 1 entry:
"a" => var"##A#257"(42)
As of JLD2 version v0.4.5
there is a fix. The JLDFile
struct contains a type_map
dictionary that allows for explicit type remapping. Now you can define a struct
that matches the old definition and load your data.
julia> struct A_old
x::Int
end
julia> f = jldopen("example.jld2","r")
JLDFile /home/jonas/.julia/dev/JLD2/example.jld2 (read-only)
└─🔢 a
julia> f.type_map["Main.A"] = A_old
A_old
julia> f["a"]
A_old(42)
closes #208
closes #191
closes #175
closes #288
todo
storing typeof(anonfun)
#37
Codecov Report
Merging #316 (e3f52a7) into master (a9c62a6) will increase coverage by
0.29%
. The diff coverage is98.03%
.
@@ Coverage Diff @@
## master #316 +/- ##
==========================================
+ Coverage 89.88% 90.18% +0.29%
==========================================
Files 27 28 +1
Lines 2720 2813 +93
==========================================
+ Hits 2445 2537 +92
- Misses 275 276 +1
Impacted Files | Coverage Δ | |
---|---|---|
src/file_header.jl | 78.57% <ø> (ø) |
|
src/data/anonymous_functions.jl | 96.66% <96.66%> (ø) |
|
src/JLD2.jl | 90.85% <100.00%> (+1.14%) |
:arrow_up: |
src/data/reconstructing_datatypes.jl | 76.36% <100.00%> (+2.36%) |
:arrow_up: |
src/data/writing_datatypes.jl | 96.96% <100.00%> (+0.38%) |
:arrow_up: |
src/backwards_compatibility.jl | 62.50% <0.00%> (-12.50%) |
:arrow_down: |
src/dataio.jl | 98.44% <0.00%> (-0.02%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a9c62a6...e3f52a7. Read the comment docs.
what is missing from this PR? AFAIK after this JLD2 would be a better candidate than BSON for serializing Flux's models in basically any situation
what is missing from this PR? AFAIK after this JLD2 would be a better candidate than BSON for serializing Flux's models in basically any situation
There are two things that are missing:
- Proper review. Currently, I appear to be the only one familiar enough with JLD2 internals and willing to implement stuff like this. Since JLD2 is used by a lot of people, I was hesitant to just merge this without outside opinions.
- I'd really like to resolve #37 , but this is a problem quite deeply embedded into JLD2. and not fixable without "breaking" changes.
- If I merge this PR before fixing #37, then I will have to implement even more legacy stuff to not break anyone's files.
The issue with #37 is this: For every dataset, JLD2 stores essentially the
- content
- description of content (e.g. memory layout on disk)
- name of datatype
This works well for data but for datatypes JLD2 is hardcoded to use the datatype signature as content. Thus, if the signature of a stored datatype is not known in a new julia session, it is impossible to reconstruct.
The fix:
- Change serialization of datatypes to contain description of their (instance) layout
- Change deserialization to create a new datatype from description when loaded datatype is not known.
is this branch workable for anonymous functions now? i tried current release version, it saves and loads correctly a dataset containing anonymous functions within a single Julia session, but when i restart a new Julia session and after using the same packages, it loads everything except anonymous functions.
I also tried BSON, JLD, and JLSO, BSON failed saving probably because my dataset contains namedtuples of different types. JLD could save, but failed load. The JLSO is like the JLD2, could save and load in a single session, but can not load in a new session.
is this branch workable for anonymous functions now? i tried current release version, it saves and loads correctly a dataset containing anonymous functions within a single Julia session, but when i restart a new Julia session and after using the same packages, it loads everything except anonymous functions.
I also tried BSON, JLD, and JLSO, BSON failed saving probably because my dataset contains namedtuples of different types. JLD could save, but failed load. The JLSO is like the JLD2, could save and load in a single session, but can not load in a new session.
Hi @babaq , I'm afraid it is not. I built this at some point and got it working partially. However, there have been significant changes to how this works between e.g. julia 1.6 and 1.7. So, it is very difficult to get working reliably. Something else you could try out, is #377. This is a Pathfinder PR that would, in principle, allow us to write objects (anonymous function) as binary blobs using the julia Serialization stdlib. It still needs work, but I hope that this is more doable.