zoon icon indicating copy to clipboard operation
zoon copied to clipboard

archiving modules

Open goldingn opened this issue 8 years ago • 10 comments

It would be great to be able to deprecate & archive modules that people no longer think are useful, in a similar way to CRAN's archiving of packages. If we move these modules to a separate place then they won't appear to people looking for new modules, but could always be fetched to reproduce an existing workflow.

The tidiest way of doing this might be to create an archive branch of the modules repo. Whenever a module is deprecated, it can then be removed from master and moved to archive.

When listing available modules zoon, thw website etc. need only look in master, but when re-running a workflow which uses a deprecated module, zoon::GetModule() could first look in master and then in archive.

This would require changes to zoon::GetModule(), zoon::LoadModule(), adjustment of the solution in #290, so that the branch is handled separately from the repo, and possibly others.

goldingn avatar May 25 '16 22:05 goldingn

@AugustT you mentioned that zoon handles module versions now. Does zoon::RerunWorkflow() grab the corresponding old versions of modules though? It looks to me like it can only grab the latest version and record which version was used.

If not, we'd need to implement that at the same time as this. I imagine the solution to both of these would involve either:

  • using the GitHub API to look back through the commit history of each module to find the corresponding version number, or
  • the modules repo storing a lookup table of module version by commit sha (added to/refreshed on each build/module upload)

goldingn avatar May 25 '16 22:05 goldingn

At the moment the version number is only tracked in the workflow object and is not used in re-running. A look up table in the modules repo sounds like a good idea.

AugustT avatar May 26 '16 08:05 AugustT

I've had a bash at this, and have working code to build a lookup table in this gist, which gives us something like:

head(lookup <- ModuleLookup())
             module     version                                      sha                when deprecated
1           airNCEP unversioned 3da6f0d05212f1fb3560b6d309afcfdac9e1a72d 2014-08-05 16:16:07       TRUE
2           AirNCEP unversioned 89e43470d474cea8a7dfbcacc9e125d90b65244f 2014-08-08 10:39:16       TRUE
3           AirNCEP         1.0 342cf7a15acea54bddf747953780364fc2ddf2d2 2015-11-13 13:18:24      FALSE
4 anophelesPlumbeus unversioned 3da6f0d05212f1fb3560b6d309afcfdac9e1a72d 2014-08-05 16:16:07       TRUE
5 AnophelesPlumbeus unversioned 89e43470d474cea8a7dfbcacc9e125d90b65244f 2014-08-08 10:39:16       TRUE
6 AnophelesPlumbeus         1.0 342cf7a15acea54bddf747953780364fc2ddf2d2 2015-11-13 13:18:24      FALSE

shas and dates are for the earliest commit in which that version was present.


The two main questions now are:

  1. where do we host this look up table
  2. how do we make sure it gets updated every time a module is added to the modules repo

The easiest option is probably to create an executable bash script modules/configure to run this on every build, and put the resulting lookup table at modules/inst/extdata/lookup.csv. People would have to remember to build before committing via GitHub, though hopefully most submissions will happen through the website soon. Will implement that soon unless there are objections.


The next step will be grabbing specific versions when re-running workflows. I'll amend zoon::GetModule() and zoon::LoadModule() so that they take the version as an argument, and grab the right code. We'll then need to get zoon::RerunWorkflow() & friends to make use of that argument, probably also with an option to use the latest version instead of the one used before.

goldingn avatar May 27 '16 07:05 goldingn

comments on the lookup table and the plan would be welcome!

goldingn avatar May 27 '16 07:05 goldingn

Also, this means that an archive branch is unecessary. All past modules are accessible, and they are considered deprecated if they are not present at the HEAD of the repo (either due to a version bump or deletion).

goldingn avatar May 27 '16 07:05 goldingn

Hi Nick, great stuff, that gist looks great, did it take long to run going back though all those commits?

The issues around maintaining the table only exist when people are not using the website to push modules as the website could have a script that updated the table as needed. This makes me think it probably isn't worth too much of your time finding a solution to that problem as the need is likely to go away.

The rest of the plan looks good.

What would the default behaviour be for the re-run functions be? should they always use the same versions as used in the workflow object? ChangeWorkflow has an argument forceReproducible which would ensures the module is on the repo but could also be used to dictate whether the same module versions or newest module versions are used. I would suggest that these should by default use the same versions of the modules as in the workflow object, i think this is the behaviour users would expect. However, it might be worth including a message that informs them about old versions of modules being used.

AugustT avatar May 27 '16 08:05 AugustT

Only about 10s to run that on my machine. This runs through the whole repo on every build, but could be adapted in the future to only run through recent commits and update. I imagine that would take <1s.

I agree re. the website running the script. Should we add the functions in the gist to zoon (unexported) for the website to access? Is that what you've been doing with the parsing functions?

goldingn avatar May 28 '16 01:05 goldingn

I agree, I was imagining that the default behaviour on re-run would be to use the same versions as the original. That's probably what users would expect, given we say workflows are reproducible. I would probably just emphasise that in the documentation for zoon::RerunWorkflow() and zoon::ChangeWorkflow(). Maybe we should get feedback at the workshop on what people would expect though.

I think it makes sense to keep this separate from forceReproducible since they do different things. I would probably add an argument recentVersions or something that would allow users to switch the reproducibility off.

goldingn avatar May 28 '16 01:05 goldingn

Currently the code for use on the website is separate from the code in zoon as it not used by users in zoon. I have a separate project on my computer for the web code. I suggested creating a github repo for this code a while ago but the idea didn't float.

AugustT avatar May 31 '16 08:05 AugustT

Cool. Ideally we'd have a GitHub repo for all of the website code, to which these bits could be added. Worth discussing at the workshop.

goldingn avatar May 31 '16 22:05 goldingn