Process writes to host home directory which is mounted in the container
Hi, within the epitopeprediction pipeline, the tool mhcflurry tries to create a folder within the HOME directory to store some downloaded data:
-
With the
Dockerprofile on Mac this causes apermission deniederror, becausemhcflurrytries to create the folder/.local(I am not sure, why in this case it does not use the home directory, @apeltzer mentioned it could be a Mac specific problem). In the past, for some pipelines the same problem was addressed by uncommenting thedocker.runOptions = '-u \$(id -u):\$(id -g)'line in thenextflow.config(https://github.com/nf-core/tools/pull/351, https://github.com/nf-core/mhcquant/pull/104). By this, the commands will be executed as root again, preventing this particularpermission deniederror, but maybe also giving raise to other errors again (https://github.com/nf-core/tools/pull/351). (Another solution for this particular problem is https://github.com/nf-core/epitopeprediction/pull/52) -
With Singularity the folder
.localis created in the actual hostHOMEdirectory, and everything seems smooth on the first view.
However, the fact that the host HOME directory is included into the container, allowing tools to actually write to and read from it, potentially unnoticed, is from a reproducibility perspective not ideal and should probably be avoided.
See also a longer discussion by @lkuchenb, @drpatelh, @pontus and @pditommaso about this topic on the slack help channel.
My analysis here would be that when running docker with the -u $(id -u):$(id -g), there is no user information available inside the container, so $HOME will likely be set to /, and anything that relies on creating stuff there should fail (this should also happen with Docker on Linux as far as I can understand).
Does the allow to specify a custom location for the folder .local? IMO
would be the best solution.
On Thu, Sep 3, 2020 at 3:35 PM Pontus Frehult [email protected] wrote:
My analysis here would be that when running docker with the -u $(id -u):$(id -g), there is no user information available inside the container, so $HOME will likely be set to /, and anything that relies on creating stuff there should fail (this should also happen with Docker on Linux as far as I can understand).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nf-core/tools/issues/734#issuecomment-686496046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGHOSHUTGMK6SEKUNF7W43SD6LSHANCNFSM4QUYPFPA .
IMHO having the host home mounted and set as $HOME in the container is a general source of potential side effects, e.g. RC files for any tool used inside the container may break reproducibility. Providing a dedicated, empty host folder as $HOME into the container would avoid such issues.
I also think pointing HOME somewhere would be best (new temporary directory or possibly the current work directory for simplicity).
Configuring tools seems reliant on additional testing and seems an unnecessary introduction of disparities between various runtime environments (docker, singularity, conda).
As mentioned, tools reading rec files can break reproducibility in a way that seems difficult to handle well in testing.
This looks very similar to an issue mentioned on Slack by @maxibor
@skrakau - did you try creating an empty directory in the container and then setting the $HOME env variable to that in the Dockerfile? eg. super crude example that probably would break but hopefully you get the idea:
RUN touch /home
RUN chmod 777 /home
ENV HOME=/home
If this is a general solution for this problem then we could add this to the nf-core base docker image that all other custom images are built from and it should work for everyone. Maybe. Might cause other problems.
For consistency (with respect to not reading random configuration files), it's probably desirable to have docker and singularity behave similarly (ignore bound directories/set up home). That likely means a static solution runs into issues with read-only mounts for other engines (e.g. singularity).
I'm not sure if we could set something up that will be sourced by the called process, but don't immediately see a solution to do it for docker.