FastQC icon indicating copy to clipboard operation
FastQC copied to clipboard

Support aggregate reporting for demultiplexed FASTQ files

Open mtomko opened this issue 2 years ago • 2 comments

Our group has long generated FastQC reports for a single lane of sequencing at a time. Our sequencing provider is now only providing demultiplexed FASTQs, which means that we need to look at hundreds of FastQC reports instead of just 2. We would be interested in an option to FastQC that generated one aggregate report for all of the demultiplexed FASTQ files, summarizing the overall quality of all of them. This would be akin to the report generated by simply concatenating all the FASTQ files and running FastQC on that.

I would consider implementing this myself if it would be welcome.

mtomko avatar Oct 17 '23 17:10 mtomko

Ah, my coworker has pointed out that it's possible to do this by reading from standard in:

If you want to run fastqc on a stream of data to be read from standard input then you can do this by specifing 'stdin' as the name of the file to be processed and then streaming uncompressed fastq format data to the program. For example:

zcat *fastq.gz | fastqc stdin

If you want the results from a streamed analysis sent to a file with a name other than stdin then you can add a colon and put the file name you want, for example:

zcat *fastq.gz | fastqc stdin:my_results

..would write results to my_result.html and my_results.zip.

mtomko avatar Oct 17 '23 17:10 mtomko

You've found one option for this which will combine the full set of results. To be honest, if you're just looking at data quality then it's pretty unlikely that you'll see a difference in quality between the different split subsets of reads so any of the reports is likely to be representative.

The other option to consider is MultiQC (https://multiqc.info/) which you can run in a directory where you have multiple FastQC (and other programs) reports and it will aggregate them into a single combined report. We use this on the end of our sequencing pipelines and it works great for this purpose.

s-andrews avatar Oct 18 '23 08:10 s-andrews