Impute.jl icon indicating copy to clipboard operation
Impute.jl copied to clipboard

EM (Expectation Maximization) implementation

Open rofinn opened this issue 7 years ago • 2 comments

rofinn avatar Apr 11 '17 23:04 rofinn

Safe and General to arbitary types is impossible AFAICT.

Every package that accepts reading arbitary types eventually falls back to the julia serializer, or internal code that is behind the serializer. Includign JLD and BSON.jl Arrow doesn't allow arbitrary types.

I recommend not serializing arbitary types if you can avoid it. Then you can use JSON, or Arrow, or CSV. It would be nice if we had a plain BSON package, but BSON.jl is an extention to BSON that can also handle abitary types.

I am tempted to close this issue as nonactionable. Our alternative to the Julia Serializer is BSON which is exactly as flawed.

oxinabox avatar Feb 17 '21 15:02 oxinabox

Safe and General to arbitrary types is impossible AFAICT.

I haven't really looked into why impossible, it seems then best to keep the serialize (as it should always work?), and fix the most important package/types e.g. OrderedDict. Is it possible for at least it to have a guarantee?

[FYI: I changed your quote to have the missing r in arbitrary. Just a friendly pointer since you mistyped twice.]

PallHaraldsson avatar Feb 17 '21 16:02 PallHaraldsson

I haven't really looked into why impossible

It's impossible because to guarantee safety, you'd need to:

  1. Manually serialize to a static set of primitive types that will never change for the lifetime of the file format (e.g., CSV, JSON, TOML) and
  2. Have types/structs version their serialization calls such that they can handle schema changes (i.e., fields added/removed/renamed). This is the same is if you did a schema change on the objects in a JSON file or renamed the columns in a CSV.

it seems then best to keep the serialize (as it should always work?)

Do you mean keep the current default of BSON + gzipped julia serialization?

fix the most important package/types e.g. OrderedDict. Is it possible for at least it to have a guarantee?

It might help if you linked to the specific thread and included a stacktrace because neither JLSO.jl or BSON.jl explicitly depend on OrderedDict for their internal representations. I don't think it should be the responsibility of this package to dictate the serializability of other arbitrary types in the julia ecosystem. We're happy to support more serialization formats (e.g., Arrow, CSV, JLD2) though, but they'll all have their own strengths and weaknesses (safety vs generalizability), as @oxinabox has already pointed out.

rofinn avatar Feb 17 '21 20:02 rofinn

Do you mean keep the current default of BSON + gzipped julia serialization?

No, because I thought BSON was non-default. I had BSON in mind when opening the issue, as a new default but if unworkable, then basically changed my mind, with "keep the serialize" meaning statue quo here, and the fix needing to be elsewhere.

The thread I had in mind with "I did [see] a thread" (fixed missing "see"):

https://discourse.julialang.org/t/orderedcollections-not-binary-compatible-with-prior-version/54540

PallHaraldsson avatar Feb 17 '21 21:02 PallHaraldsson