DrWatson.jl icon indicating copy to clipboard operation
DrWatson.jl copied to clipboard

savename and path-argument

Open JonasIsensee opened this issue 4 years ago • 14 comments

I came across what someone might consider a bug.

julia> p = Dict(:sourcefile => "path/to/my/sourcefile")
Dict{Symbol,String} with 1 entry:
  :sourcefile => "path/to/my/sourcefile"

julia> savename(p)
"sourcefile=path/to/my/sourcefile"

julia> produce_or_load(p, p -> p)
┌ Warning: Using the standard Julia project.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/project_setup.jl:30
[ Info: File sourcefile=path/to/my/sourcefile.bson does not exist. Producing it now...
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of the commit ID.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:48
┌ Warning: The directory ('/home/jonas/.julia/environments/v1.4') is not a Git repository, returning `nothing` instead of a patch.
└ @ DrWatson ~/.julia/packages/DrWatson/z56YI/src/saving_tools.jl:95
[ Info: File sourcefile=path/to/my/sourcefile.bson saved.
(Dict(:sourcefile => "path/to/my/sourcefile"), "sourcefile=path/to/my/sourcefile.bson")

shell> tree
.
└── sourcefile=path
    └── to
        └── my
            └── sourcefile.bson

3 directories, 1 file

What do you think? Should we add a warning when path delimiters are part of savename or should we escape them?

Another thought: Would anyone be interested in a savename option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.

JonasIsensee avatar Jun 26 '20 07:06 JonasIsensee

I guess we should in general do more checks in savename() since there are more characters which can cause problems. Especially on windows the list is quite large (<>/?*|": etc.).

What do you mean by a hash? Something like a base64 encoding or so?

tamasgal avatar Jun 26 '20 08:06 tamasgal

Would anyone be interested in a savename option to output a hash instead of the normal behaviour? That would help when the string becomes longer than the OS allows.

I think this is a good idea, but probably better to do it as a separate function. savename is already so heavy...

Should we add a warning when path delimiters are part of savename or should we escape them?

How does this work on the actual name of the file? Isn't it impossible to save a file that contains / or \ in their name? At least in windows? I'd say go for the warning.

Datseris avatar Jun 28 '20 11:06 Datseris

How does this work on the actual name of the file? Isn't it impossible to save a file that contains / or \ in their name? At least in windows? I'd say go for the warning.

Well somehow I ended up with this: And btw. this is also why I wanted to use some kind of hash instead.

.
├── Ithreshup100k=5_b=0.16_liftlockdowndelay=14_localdir=
│   └── data
│       └── username
│           └── epiproject
│               └── dataset_20200525
│                   ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│                   └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
├── Ithreshup100k=5_b=0.2_liftlockdowndelay=14_localdir=
│   └── data
│       └── username
│           └── epiproject
│               └── dataset_20200525
│                   ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
│                   └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_subdivsize=100000_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson
└── Ithreshup=5_b=0.24_liftlockdowndelay=14_localdir=
    └── data
        └── username
            └── epiproject
                └── dataset_20200525
                    ├── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_xvals=(1=0,10=1,2=0,3=0,4=0.001,5=0.003,6=0.01,7=0.032,8=0.1,9=0.316)_ycol=lockdowndays_weightedmean.bson
                    └── _model=SIRsubdivision.MeanFieldLockdownAll.Population_timesel=complete_xcol=xi_ycol=lockdowndays_weightedmean.bson

JonasIsensee avatar Jun 29 '20 14:06 JonasIsensee

but do you still get same thing if you escape slashes?

Datseris avatar Jun 29 '20 15:06 Datseris

I guess we should in general do more checks in savename() since there are more characters which can cause problems. Especially on windows the list is quite large (<>/?*|": etc.).

What do you mean by a hash? Something like a base64 encoding or so?

hm, I looked at base64 but my impression is that filenames would not get shorter.. Base.hash looks more promising even if it is not reversible.

but do you still get same thing if you escape slashes?

How would you like me to escape them?

Doing // does not work and \/ also errors

JonasIsensee avatar Jun 29 '20 15:06 JonasIsensee

or should we escape them?

You suggested that they can be escaped :P I never thought it was possible :P

Datseris avatar Jun 29 '20 18:06 Datseris

Base.hash looks more promising even if it is not reversible.

I've tested a bit and Base.hash(savename(...)) gives same hash for same input string. It is not invertible but it is deterministic, which is one of the main purposes of savename. What I wonder is whther these hashes change from Julia version to Julia version.

Datseris avatar Jun 30 '20 07:06 Datseris

I don't know about that. Also, I was mostly thinking of using Base.hash(c) instead of savename(c). There is no point in still risking string rounding issues a.k.a 0.154 != 0.153 but "0.15" == "0.15" when we're not using the string as a filename anyway.

EDIT: If you are just using savename by itself, then of course you can just exchange it for Base.hash but then you can't use produce_or_load anymore. (Which was my application)

JonasIsensee avatar Jun 30 '20 08:06 JonasIsensee

Btw. alternatively we can also think about a meta file, which would save the information in an external file. savename could provide a hash and a JSON file could hold the parameters. Just thinking out loud...

tamasgal avatar Jul 06 '20 18:07 tamasgal

that might be an option.

Have a look at https://github.com/invenia/JLSO.jl ! I just found this and I think this definitely needs a shoutout in the docs. It includes a project and manifest as metadata with any file so later you can just activate a file environment.

JonasIsensee avatar Jul 06 '20 18:07 JonasIsensee

Wait, can't this replace BSON.jl entirely...?

Datseris avatar Jul 06 '20 18:07 Datseris

Well it says it uses BSON for storing the metadata, so you can't get rid of BSON entirely. However, maybe it's more stable regarding custom types. The julia serializer works very well, though the docs say it only works reliable for the same julia version. Maybe storing the state of the current install helps with that problem

sebastianpech avatar Jul 06 '20 18:07 sebastianpech

https://docs.julialang.org/en/v1/stdlib/Serialization/

In general, this process will not work if the reading and writing are done by different versions of Julia, or an instance of Julia with a different system image.

sebastianpech avatar Jul 06 '20 18:07 sebastianpech

I kinda missed that discussion, but the idea with the metadata is basically what my metadata implementation is about. Regarding hashing I found that this

https://github.com/sebastianpech/DrWatsonSim.jl/blob/820d26eaea671a798788cf360e112b316202d14c/src/Metadata.jl#L50-L51

works quite well.

sebastianpech avatar Jul 06 '20 18:07 sebastianpech