fastp
fastp copied to clipboard
In overrepresented sequence analysis, it seems like it should be i<=len-step instead of <
Hi! I am using the latest fastp and I found in state.cpp:
// do overrepresentation analysis for 1 of every 100 reads
if(mOptions->overRepAnalysis.enabled) {
if(mReads % mOptions->overRepAnalysis.sampling == 0) {
const int steps[5] = {10, 20, 40, 100, min(150, mEvaluatedSeqLen-2)};
for(int s=0; s<5; s++) {
int step = steps[s];
for(int i=0; i<len-step; i++) {
string seq = r->mSeq->substr(i, step);
if(mOverRepSeq.count(seq)>0) {
mOverRepSeq[seq]++;
for(int p = i; p < seq.length() + i && p < mEvaluatedSeqLen; p++) {
mOverRepSeqDist[seq][p]++;
}
i+=step;
}
}
}
}
}
this line : for(int i=0; i<len-step; i++), it seems like it should be i<=len-step instead of <.
If it is <, it seems to cause the number of hotseqs found during preprocessing to be 0 at the end.
Incidentally, why i+=step, is it because over-representation of sequences cannot have overlap?
Thank you!