Out-of-order paired fastq does not raise an error
Hi,
If paired fastq files are out-of-order, fastp seems to continue as normal, despite the pairing assumption (for dropping both reads if one pair files if e.g. --unpaired1 is not set) not being satisfied.
I was hoping to just put a readname check in the pe processing loop
https://github.com/OpenGene/fastp/blob/7a0acececad43229ff64a45ff033a44fd99655f4/src/peprocessor.cpp#L383-L386
if (*or1->mName!=*or2->mName) {
//skip or assert an error
}
But the mName for the read contains the whole line (read indexes or the "MGI" format or /1 or /2), which means many reads won't have equal IDs. Maybe taking the substring up to the first space (but still would need to handle the /1 or /2), but this gets messy.
Realistically this should be fixed upstream, but would make sense that this would also raise an issue during a QC step.
Best, Alex
it's easy to implement this, but why the paired fastq files get out-of-order?
Presumably some earlier mistake in handling the file. This should not happen (it is fair to assume R1/R2 are matched order), but it seems like something that could be flagged during QC and probably wouldn't affect runtime much.