DrWatson.jl
DrWatson.jl copied to clipboard
savename and path-argument
I came across what someone might consider a bug.
julia> p = Dict(:sourcefile => "path/to/my/sourcefile")
Dict{Symbol,String} with 1 entry:
:sourcefile => "path/to/my/sourcefile"
julia> savename(p)
"sourcefile=path/to/my/sourcefile"
julia> produce_or_load(p, p -> p)
┌ Warning: Using the standard Julia project.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/project_setup.jl:30
[ Info: File sourcefile=path/to/my/sourcefile.bson does not exist. Producing it now...
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of the commit ID.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:48
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of a patch.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:95
[ Info: File sourcefile=path/to/my/sourcefile.bson saved.
(Dict(:sourcefile => "path/to/my/sourcefile"), "sourcefile=path/to/my/sourcefile.bson")
shell> tree
.
└── sourcefile=path
└── to
└── my
└── sourcefile.bson
3 directories, 1 file
What do you think?
Should we add a warning when path delimiters are part of savename
or should we escape them?
Another thought:
Would anyone be interested in a savename
option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.
I guess we should in general do more checks in savename()
since there are more characters which can cause problems. Especially on windows the list is quite large (<>/?*|": etc.).
What do you mean by a hash? Something like a base64 encoding or so?
Would anyone be interested in a savename option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.
I think this is a good idea, but probably better to do it as a separate function. savename
is already so heavy...
Should we add a warning when path delimiters are part of savename or should we escape them?
How does this work on the actual name of the file? Isn't it impossible to save a file that contains / or \
in their name? At least in windows? I'd say go for the warning.
How does this work on the actual name of the file? Isn't it impossible to save a file that contains
/ or \
in their name? At least in windows? I'd say go for the warning.
Well somehow I ended up with this: And btw. this is also why I wanted to use some kind of hash instead.
.
├── Ithreshup100k=5_b=0.16_liftlockdowndelay=14_localdir=
│ └── data
│ └── username
│ └── epiproject
│ └── dataset_20200525
│ ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│ └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
├── Ithreshup100k=5_b=0.2_liftlockdowndelay=14_localdir=
│ └── data
│ └── username
│ └── epiproject
│ └── dataset_20200525
│ ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│ └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
└── Ithreshup=5_b=0.24_liftlockdowndelay=14_localdir=
└── data
└── username
└── epiproject
└── dataset_20200525
├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
└── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
but do you still get same thing if you escape slashes?
I guess we should in general do more checks in
savename()
since there are more characters which can cause problems. Especially on windows the list is quite large (<>/?*|": etc.).What do you mean by a hash? Something like a base64 encoding or so?
hm, I looked at base64
but my impression is that filenames would not get shorter..
Base.hash
looks more promising even if it is not reversible.
but do you still get same thing if you escape slashes?
How would you like me to escape them?
Doing //
does not work and \/
also errors
or should we escape them?
You suggested that they can be escaped :P I never thought it was possible :P
Base.hash looks more promising even if it is not reversible.
I've tested a bit and Base.hash(savename(...))
gives same hash for same input string. It is not invertible but it is deterministic, which is one of the main purposes of savename
. What I wonder is whther these hashes change from Julia version to Julia version.
I don't know about that.
Also, I was mostly thinking of using Base.hash(c)
instead of savename(c)
.
There is no point in still risking string rounding issues a.k.a 0.154 != 0.153
but "0.15" == "0.15"
when we're not using the string as a filename anyway.
EDIT: If you are just using savename
by itself, then of course you can just exchange it for Base.hash
but then you can't use produce_or_load
anymore. (Which was my application)
Btw. alternatively we can also think about a meta file, which would save the information in an external file. savename
could provide a hash and a JSON
file could hold the parameters. Just thinking out loud...
that might be an option.
Have a look at https://github.com/invenia/JLSO.jl !
I just found this and I think this definitely needs a shoutout in the docs.
It includes a project and manifest as metadata with any file so
later you can just activate
a file environment.
Wait, can't this replace BSON.jl entirely...?
Well it says it uses BSON for storing the metadata, so you can't get rid of BSON entirely. However, maybe it's more stable regarding custom types. The julia serializer works very well, though the docs say it only works reliable for the same julia version. Maybe storing the state of the current install helps with that problem
https://docs.julialang.org/en/v1/stdlib/Serialization/
In general, this process will not work if the reading and writing are done by different versions of Julia, or an instance of Julia with a different system image.
I kinda missed that discussion, but the idea with the metadata is basically what my metadata implementation is about. Regarding hashing I found that this
https://github.com/sebastianpech/DrWatsonSim.jl/blob/820d26eaea671a798788cf360e112b316202d14c/src/Metadata.jl#L50-L51
works quite well.