rnaseq Too many input files for MultiQC

I ran the RNA-seq pipeline on 360 samples, and the slurm submission of multiQC failed with Pathname of a file, directory or other parameter too long

ERROR ~ Error executing process > 'multiqc'
Caused by:
 Failed to submit process to grid scheduler for execution
Command executed:
 sbatch .command.run
Command exit status:
 1
Command output:
sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long

The files .command.stub and .command.sh look normal, but .command.run is 11Mb, with many commands for lnetc. So it might be something related to this bug: https://bugs.schedmd.com/show_bug.cgi?id=2198

Oct 05 '18 10:10 orzechoj

@pditommaso - have you come across problems like this before? I guess that this is because the MultiQC process is softlinking in a lot of files which makes .command.run massive so that slurm rejects it.

Oct 05 '18 11:10 ewels

Ouch, 11Mb of input files! You can mitigate this problem using an directory as output instead files. I mean, instead of having

   output:
    file "*_fastqc.{zip,html}" into fastqc_results

let multiqc to save the files into a directory e.g. reports, then

   output:
    file "reports" into fastqc_results

Oct 05 '18 11:10 pditommaso

Yes, maybe we should profile how many files each channel going into MultiQC has. I suspect that there are quite a few that aren't needed. For example - MultiQC only needs the zip file here, not the html. So could make new MultiQC-specific channels that have just these files to cut down on the number.

Oct 05 '18 12:10 ewels

I'm wondering whether @olgabot had issues with this when doing her large-scale nf-core/rnaseq experiments on AWS - any ideas?

Sep 24 '19 14:09 apeltzer

i ran the RNAseq pipeline on 576 fastq files and the slurm submission has also failed on the multiqc process with the same error: sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long There is no .command.out in work/ Is there any update on a work around for this? Thank you

May 12 '20 21:05 ojziff

FYI: A user just encountered the same error in nf-core/eager when trying to run a 1000 sample job. If I understand the solution proposed above, in this case I don't think the directory output would necessarily work as most of the log files in this case are standalone from separate processes (rather than lots of logs from a single process).

Oct 16 '20 12:10 jfy133

Had that some days ago and opened https://github.com/nextflow-io/nextflow/issues/2118 for some points

May 18 '21 12:05 apeltzer

Just for the record, we've also had this issue now with nf-core/airrflow

Dec 14 '22 16:12 ggabernet

Re the nf-core/airrflow issue @ggabernet just mentioned. I can confirm the .command.run file size exceeds the SLURM max_script_size reported by scontrol show config. There are many rm and ln lines in the section nxf_stage() .

Dec 14 '22 17:12 ssnn-airr

The issue at Nextflow is still open, the small scale mitigation attempts did also not help us permanently either: Maybe also comment here too to make sure this gets addressed soon 👉🏻 https://github.com/nextflow-io/nextflow/issues/2852

Dec 14 '22 19:12 apeltzer

same issue on nf-core/proteinfold softlinking mmcif_files about 210342 lines of softlinking?

Oct 05 '23 10:10 m3hdad

Should be better when using https://github.com/nextflow-io/nextflow/issues/2852

Oct 06 '23 11:10 apeltzer

rnaseq rnaseq copied to clipboard

Too many input files for MultiQC

rnaseq
rnaseq copied to clipboard