fastp icon indicating copy to clipboard operation
fastp copied to clipboard

.JSON File (v0.23.1): Understanding after_filtering Vs filtering_result.

Open g-pacheco opened this issue 3 years ago • 2 comments

Dear Fastp,

I would like to kindly ask for some clarification regarding the results presented in the .JSON file. Specifically, I would like to know what is the difference between the total_reads [after_filtering] and the passed_filter_reads [filtering_result]. My intuition tells me that these numbers should be the same, and I believe that that is what you have in your example. However, my two numbers are different and I do not quite understand why.

                "sequencing": "paired end (150 cycles + 150 cycles)",
                "before_filtering": {
                        "total_reads":6510970,
                        "total_bases":976645500,
                        "q20_bases":920167817,
                        "q30_bases":853405681,
                        "q20_rate":0.942172,
                        "q30_rate":0.873813,
                        "read1_mean_length":150,
                        "read2_mean_length":150,
                        "gc_content":0.440688
                },
                "after_filtering": {
                        "total_reads":1176646,
                        "total_bases":183937612,
                        "q20_bases":179080377,
                        "q30_bases":170016138,
                        "q20_rate":0.973593,
                        "q30_rate":0.924314,
                        "read1_mean_length":156,
                        "gc_content":0.434624
                }
        },
        "filtering_result": {
                "passed_filter_reads": 6312238,
                "corrected_reads": 256148,
                "corrected_bases": 621598,
                "low_quality_reads": 172442,
                "too_many_N_reads": 2,
                "low_complexity_reads": 1930,
                "too_short_reads": 24358,
                "too_long_reads": 0

I have been using MultiQC to aggregate the results obtained by Fastp, and I would like to make sure it is feeding on the correct numbers reported by Fastp.

Please let me know should you require any further information from my end.

Thankful in advance, George.

g-pacheco avatar Oct 20 '21 10:10 g-pacheco

Same issue here, since MultiQC calculates the fraction of mapped reads through the ratio of after_filtering:total_reads and before_filtering:total_reads. In the above example, we'd get 18.07% (1176646 / 6510970), when it seems the correct value would be 96.95% (6312238 / 6510970).

@sfchen is this intended behaviour?

thanks,

fgvieira avatar Oct 26 '21 12:10 fgvieira

Hello @sfchen, I was wondering if you have had the time to look into this.

Thanks in advance, George.

g-pacheco avatar Nov 02 '21 08:11 g-pacheco

+1 @sfchen this seems to be a bug. It looks like fastp is reporting the number of merged reads only in this field, when intuitively it should be all reads passing filter. The total_bases field has the same issue.

schorlton avatar Nov 21 '22 18:11 schorlton