temp files not deleted until array job is completed
Hi all,
I am using toil to run a CWL array job that processes a long list of files. Each input is processed independently and produces an output, together with a lot of temp files that I do not need.
toil gives me the all the outputs only at the end of the array job, when all the temp files are deleted. Until then the temp files stay on disk (and can fill my filesystem if the job is long enough), and I have no access to outputs.
Is there a way to tell toil that I want to see the output for each input as soon as it is produced, and that it should not wait the end of the array job to delete temp files?
┆Issue is synchronized with this Jira Story ┆friendlyId: TOIL-300
@claudiodonati Currently there isn't an option to do what you're asking directly that I'm aware of, though @mr-c might know of a way?
I suppose as a way of getting around this, you might be able to run something like this: https://toil.readthedocs.io/en/latest/running/cwl.html#running-cwl-within-toil-scripts
Where each job should complete and delete the temp files before the next runs.
This is a good option request though, and will need to be implemented eventually.
When using the file store, each job ought to be writing its output to the job store and cleaning up its temp files, with the data being exported from the job store at the end. It's going to be hard to work around not having enough space in the job store, as opposed to on the worker machine(s), but --bypassFileStore might let the output go directly to its final location in a way that works more like cwltool.