FastQC
FastQC copied to clipboard
Attempt to detect sorted input files
Since there is a known issue with the duplication and overrepresented modules if the input file is sorted then I received a suggestion to flag up sorted input files. We could do this based on position if we had BAM input, but on a smaller scale we could probably detect this in the duplication plot, because there would be a tendancy for duplicates of a given sequence to come together (might not be true for repetitive sequences).
Might be worth thinking if we are able to spot this behaviour so we can add a warning to the output.
Per email conversation, the issue arises with coordinate-sorted inputs, as opposed to collated/sorted by read name.