hail icon indicating copy to clipboard operation
hail copied to clipboard

[batch] Optionally encode an iterable of resources as a file of filenames

Open danking opened this issue 4 months ago • 1 comments

What happened?

On GNU/Linux, when spawning a process, there are practical limits to the size of the arguments array and the environment. These limits are documented in the execve man page. Bioinformatics tools sometimes work around this limitation by accepting a single file which contains new-line separated arguments. For example: bcftools --file-list NAME ....

The concrete proposal was to provide an operation like hb.file_list:

j.command(f'bcftools --file-list {hb.file_list(resources_list)}')

The question is how to construct this file list. We could generate code to create the file but this would still fail if there were so many resources that the environment was too large.

A really resilient way to do this would be for the Batch worker to mount a list of filenames into the container as a file. That gives users a robust way to project a very large number of resource filenames into the container.

Version

0.2.128

Relevant log output

No response

danking avatar Feb 26 '24 16:02 danking