gatb-minia-pipeline icon indicating copy to clipboard operation
gatb-minia-pipeline copied to clipboard

Flag to remove intermediate files

Open bstamps opened this issue 4 years ago • 4 comments

Hello,

Is there a flag/option to remove intermediate files (SAM, and the .glue files specifically) during a run of the pipeline? I'm running a large assembly and the folder is > 5 TB at the moment. I didn't see any options in the help or in this repo.

bstamps avatar Aug 22 '20 23:08 bstamps

You can always remove the *hd5 files if your contig assemblies have finished ;)

Ofcourse I'm assuming this since you have reached the mapping stage.

I also tend to remove the glue files.

harish0201 avatar Sep 01 '20 16:09 harish0201

Yes, I do as well. I suppose it's a suggestion as an enhancement to the pipeline. During large assemblies where you aren't watching when to remove the files you may saturate a filesystem and cause the run to fail.

bstamps avatar Sep 01 '20 20:09 bstamps

I had the same issue and created a modified version that has a --cleanup flag that removes *.h5 and glu files after each iteration.
If one of the developers wants to review the code, I can make a PR. Otherwise I can just share the script if somebody ever needs it.

soungalo avatar Jan 17 '22 08:01 soungalo

The script would be appreciated on my end, obviously I can't speak for the devs but it does seem like the ability to clean those files after each iteration would be highly desirable when assembling very large datasets.

bstamps avatar Jan 18 '22 17:01 bstamps