containers Ability to get new containers while having all prior or used containers frozen to specific version

Currently it would not be possible to update from original containers dataset with the purpose to only get new containers, while keeping current ones (possibly already used in some analysis) at current version -- "merging" of .datalad/config with the remote version would update all image configs with the new version.

Possible ways:

provide some scripts/freeze_containers script which would make a duplicate section for the container after its original section in .datalad/config, e.g.:

[datalad "containers.bids-validator"]
	updateurl = shub://ReproNim/containers:bids-validator--1.2.3
	image = images/bids/bids-validator--1.2.3.sing
	cmdexec = {img_dspath}/scripts/singularity_cmd run {img} {cmd}

...

### FROZEN CONTAINERS

[datalad "containers.bids-validator"]
	image = images/bids/bids-validator--1.2.3.sing

so whenever a new version to be merged, most likely conflict would occur at the end of the file, but at least it would be easy to troubleshoot and original "full" record would get is new image entry without affecting the end value of the image for the container

enhancement to above: We can prepopulate that trailing section within this dataset:

### FROZEN CONTAINERS

[datalad "containers.bids-validator"]
# end of  datalad "containers.bids-validator"

[datalad "containers.bids-fmriprep"]
# end of  datalad "containers.bids-fmriprep"
...

### END OF FROZEN CONTAINERS

and make sure that for every container (which is to stay above the ### FROZEN CONTAINERS) we add we also add this blank section within it. Then merges should proceed fine, users would be able to freeze needed containers. So the only thing we would need is within this repo make sure that new containers entries are added correctly and then add that script which would also need to understand this format to add new image entries.

provide some scripts/freeze_containers script which would adjust image entries within .datalad/config for specified/all containers so it would cause conflict upon merge and require conscious conflict resolution (or just git merge -S ours, but I am afraid trailing hunk could swallow then newly added container configs) to decide to either upgrade specific image version to the new one or not. There could even be some custom merge helper to perform merge by simply adopting only new sections of the config

Any other way @kyleam @mih @bpoldrack which might come to your mind?

I was not sure if that would be anything to tackle at datalad-containers level, since more relevant to such "datalad containers" distribution) so decided to file here first.

May 08 '19 20:05 yarikoptic

Hm. To me looks somewhat dirty at a first glance. After all, we have a version controlled dataset and building some kind of a hackish "version control" on top here. Any old version would still have reference in earlier commits - technically it's not lost on such an update. We should take advantage of what's there, I think. Now, depends on what you mean by "still needed". References to old version should simply reference the commit instead of just a path and thereby you'd still have all you need.

If it is about having two versions available in the worktree, then I think you should just use two container (sub-)datasets. You could have two subdatasets where one has the newer version available in HEAD and the other one is not updated wrt to that image. Referencing names for containers in subdatasets would distinguish both. I think, that's the way to go if you need multiple versions.

May 09 '19 06:05 bpoldrack

Clarification: Of course, it doesn't need subdatasets. You can also simply reference two images within a single dataset.

But may be I misunderstood your aim. If it's just about not updating them (not keeping two versions, but the old one only), then update should be used with path arguments to specify what to update. and what not. If it's about some kind of configuration/automation of exactly that, then I don't see why containers would anyhow be special. This should have a configuration for update itself, like you can configure git push for example, by specifying patterns of what to push where. Should look similar for update in my opinion.

In case that was just another misinterpretation of what you want to achieve, I need further explanation of the goal ;-)

May 09 '19 07:05 bpoldrack

I think this situation is no different from a general need to have multiple versions of a file simultaneously accessible. The low-tech solution is to encode the version into the file name. If there is a container dataset that aims to provide multiple version simultaneously, I see no reason not to use this approach.

If it is just about a dataset that used containers from a specific commit of a container dataset, this information is already encoded in a past commit. I do not see why this has to be maintained in the worktree. Or how it could be maintained in the worktree, as any update/merge/whatever needs to look into any such file and make sure that previous content gets preserved.

I'd stay clear of file content manipulation and add additional (config) files, if needed.

May 09 '19 07:05 mih

Thank you @bpoldrack and @mih -- I will digest it better and reply in greater detail.

Re maintaining multiple versions in the same tree -- although YODA should (eventually) rule the world, I see this "distribution" dataset also valuable for folks with centralized deployment where they could reuse this containers as is even if they don't embed it into their datasets (since might not use them yet). Also having these multiple versions allows for the use case I am targeting with this issue - ability to gain access to newer versions of the containers while still being capable easily to use previous ones (e.g. for consistency of operation in ongoing study). It is like using Debian unstable but not upgrading all packages at once but only selected ones, "until ready" for the full upgrade. So the situation here is a bit different from a "dataset versioning" case where you do want to upgrade entire dataset. Here you might want to "upgrade" to execute only some new containers.

May 10 '19 02:05 yarikoptic

[ I'm still not sure how I feel about this repository's approach of storing each version in the working tree, but assuming that setup and focusing on the possible approaches for selective freezing of a subset of containers... ]

Talking to you in person today, I suggested using git config's include.path to point to a separate config file for the frozen overrides, but then you pointed out that git config --file ... ignores include statements by default (and thus dataset.config does because it passes .datalad/config as the file). We also mentioned using the untracked .git/config, which would work in terms of the initial run and provenance information but wouldn't be ideal in the sense that the frozen config wouldn't persist across clones. It just occurred to me that a hybrid setup might work well:

have a tracked config file under .datalad/ that contains the overrides for the frozen containers
enable frozen containers in a local repository with an include.path in .git/config that points the tracked config file

That would allow you to manage the custom config in a tracked file without worrying about conflicts. And testing this out quickly, DataLad's config system and datalad-containers seem fine with it.

May 10 '19 03:05 kyleam

containers containers copied to clipboard

Ability to get new containers while having all prior or used containers frozen to specific version

containers
containers copied to clipboard