rnaseq
rnaseq copied to clipboard
Too many input files for MultiQC
I ran the RNA-seq pipeline on 360 samples, and the slurm submission of multiQC failed with Pathname of a file, directory or other parameter too long
ERROR ~ Error executing process > 'multiqc'
Caused by:
Failed to submit process to grid scheduler for execution
Command executed:
sbatch .command.run
Command exit status:
1
Command output:
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long
The files .command.stub
and .command.sh
look normal, but .command.run
is 11Mb, with many commands for ln
etc. So it might be something related to this bug: https://bugs.schedmd.com/show_bug.cgi?id=2198
@pditommaso - have you come across problems like this before? I guess that this is because the MultiQC process is softlinking in a lot of files which makes .command.run
massive so that slurm rejects it.
Ouch, 11Mb of input files! You can mitigate this problem using an directory as output instead files. I mean, instead of having
output:
file "*_fastqc.{zip,html}" into fastqc_results
let multiqc to save the files into a directory e.g. reports
, then
output:
file "reports" into fastqc_results
Yes, maybe we should profile how many files each channel going into MultiQC has. I suspect that there are quite a few that aren't needed. For example - MultiQC only needs the zip file here, not the html. So could make new MultiQC-specific channels that have just these files to cut down on the number.
I'm wondering whether @olgabot had issues with this when doing her large-scale nf-core/rnaseq experiments on AWS - any ideas?
i ran the RNAseq pipeline on 576 fastq files and the slurm submission has also failed on the multiqc process with the same error:
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long
There is no .command.out in work/
Is there any update on a work around for this? Thank you
FYI: A user just encountered the same error in nf-core/eager when trying to run a 1000 sample job. If I understand the solution proposed above, in this case I don't think the directory output would necessarily work as most of the log files in this case are standalone from separate processes (rather than lots of logs from a single process).
Had that some days ago and opened https://github.com/nextflow-io/nextflow/issues/2118 for some points
Just for the record, we've also had this issue now with nf-core/airrflow
Re the nf-core/airrflow issue @ggabernet just mentioned. I can confirm the .command.run
file size exceeds the SLURM max_script_size
reported by scontrol show config
. There are many rm
and ln
lines in the section nxf_stage()
.
The issue at Nextflow is still open, the small scale mitigation attempts did also not help us permanently either: Maybe also comment here too to make sure this gets addressed soon 👉🏻 https://github.com/nextflow-io/nextflow/issues/2852
same issue on nf-core/proteinfold
softlinking mmcif_files
about 210342 lines of softlinking?
Should be better when using https://github.com/nextflow-io/nextflow/issues/2852