singularity icon indicating copy to clipboard operation
singularity copied to clipboard

Support for named volumes

Open PauloMigAlmeida opened this issue 3 years ago • 7 comments

Is your feature request related to a problem? Please describe.

I cannot count how many times I've (accidentally) deleted my 'persistent' paths before as I'm always tempted to have those mapped folders within my project directory structure for tidiness reasons 😬

That's clearly my mistake above anything else but this also made evident some of the hidden benefits of docker volumes. I was wondering if we could get something similar.

Describe the solution you'd like

The solutions I propose is to use the ~/.singularity/volumes as the default location to store named volumes.

This location could be overridden using a env var (let's say SINGULARITY_VOLUMES_PATH).

The creation process would be something like singularity volume create|remove <volume_name>

And last but not least, when the user specified a bind path, if the name matches with an existing named volumes, it would map it to the location where the named volume resides like singularity run --bind <volume_name>:/my/path

Describe alternatives you've considered

I initially considered adding this feature to singularity-compose but it was suggested to me that maybe having this feature in singularity could be a better fit for the problem.

https://github.com/singularityhub/singularity-compose/issues/50

Additional context

I was wondering if there is any interest in having such feature or if this is a no-go type of feature for whatever reason. I'm keen to assist on the development of that if necessary.

PauloMigAlmeida avatar Oct 06 '21 22:10 PauloMigAlmeida

In my view, this one is a bit difficult. It's definitely something that would be nice to have, but the devil is in the details.

For example, in Docker, a named volume is auto-created if it does not exist when a container run attempts to use it. This is probably something people would expect here. We strive to ensure that SingularityCE works well by default in common HPC environments. In these environments a container is quite frequently run in parallel... either as completely separate (independent but concurrent) invocations, or a coordinated invocation like an MPI job or similar. These runs happen across various machines and there is no singularity daemon or central service that manages them. Filesystems like $HOME are generally network filesystems. There can be cache consistency and locking issues (or lack of those features).

If we add named volumes, and there are parallel runs, how should we handle creation of the volume if multiple singularity instances are all trying to use a named volume at once? How can we detect, and handle, any issues with cache consistency or locking on the underlying filesystem that may make this process dangerous or error prone?

These issues are certainly still present when you manually --bind a directory in, but they aren't 'hidden'. Supporting named volumes infers some special magic around volume creation and management, that should always work, while with a --bind it's pretty clear it is only mounting a host directory into the container.

What would be good here, if you'd like to pursue it, is to think a bit about this type of stuff and try to define very clearly and explicitly what should happen in normal workflows, and some pathological cases. E.g. what if I put volume create or volume remove in a batch script and submit it for parallel execution. How should singularity react?

With enough examples we can then start to see clearly what makes technical sense, and how this might be implemented.

A minor thought - we've recently added the --mount option that mirrors (partially) Dockers --mount syntax... so we'd probably want to use that for named volumes. --bind can't really be overloaded for named volumes as you can't distinguish between a volume name and a identical relative path easily.

dtrudg avatar Oct 07 '21 14:10 dtrudg

+1 what @PauloMigAlmeida said - it would be very nice to implement for Singularity compose! I think it would be rather simple? Just have some place to keep volumes in the Singularity cache and then bind to them. The use case is that it's easy to accidentally delete volumes that are in the present working directory. TLDR: a named volume is just a managed filesystem bind (to still get the same features you would wit a normal bind).

vsoch avatar Oct 07 '21 15:10 vsoch

If an explicit volume create being required in order to use a volume is an acceptable trade-off, and anonymous volumes are not implemented, then this is relatively straightforward. So long as the failure paths for volume create / volume remove being called concurrently (accidentally) are well defined and make sense to the user.

If we want to support operating in the same way as people are familiar with docker it is more difficult, as then the implicit creation, and e.g. cleanup of anonymous volumes etc. has no persistent management process to co-ordinate things. I would have anticipated, from a singularity-compose standpoint, as close to docker as possible would be beneficial?

Edit - I guess what I'm getting at here is that this is the type of feature in which we really need a complete set of specific use cases in order to define the level of complexity that will be necessary in the technical implementation.

dtrudg avatar Oct 07 '21 16:10 dtrudg

@PauloMigAlmeida you have more experience with wanting this feature - do we need the complexity of what docker does?

vsoch avatar Oct 07 '21 16:10 vsoch

@vsoch: @PauloMigAlmeida you have more experience with wanting this feature - do we need the complexity of what docker does?

No, I don't think we need to have the same level of complexity that docker has for their implementation of this feature.

@dtrudg: If an explicit volume create being required in order to use a volume is an acceptable trade-off, and anonymous volumes are not implemented, then this is relatively straightforward.

I agree with that. Explicit volume creation seems the way to go here to reduce to the minimum that 'hidden magic' from the user in which the devil can hide

@dtrudg: So long as the failure paths for volume create / volume remove being called concurrently (accidentally) are well defined and make sense to the user.

I think that if we approach that purely from a race condition point of view, we will go down a rabbit hole that won't progress much as, given the absence of a daemon to transactionally control when and how a volume can be deleted, things get 'too fun' 🥲

I suggest that we approach this feature as @vsoch succinctly described: a named volume is just a managed filesystem bind (to still get the same features you would with a normal bind).

You can expect precisely the same behaviour you would get if instead of running volume create <name> and volume delete <name> the user was running mkdir <name> and rm -rf <name>. So the benefits of using the volume mechanism would be translated to:

  1. volumes/folders will reside in a different place than the project which helps a lot when you have lousy people like me working on the HPC server 😓 (remember this could have its default location changed by either singularity.conf or a ENV var to ensure the right filesystem with the correct capabilities is in place.
  2. Most importantly, this would reduce significantly the command execution differences between environments. Today we have to remap binds whether we are using the CLI or singularity-compose every time we run in a different machine... which is not only a pain in the bum but very error prone.

EDIT: One thing that still isn't clear to me is whether the best place to implement this is singularity or singularity-compose. I will defer this decision to you both as I can see both of them implementing it.

PauloMigAlmeida avatar Oct 07 '21 22:10 PauloMigAlmeida

Hi @vsoch @dtrudg, just following up on this thread.

Have you guys thought about whether this functionality fits either singularity or singularity-compose (or none of them 😅 )?

PauloMigAlmeida avatar Oct 28 '21 07:10 PauloMigAlmeida

My opinion is the same - that it should be supported in Singularity natively, and then extended to singularity-compose. Cheers!

vsoch avatar Oct 28 '21 13:10 vsoch