cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

Issue about Read Count Differences When Using -e and --overlap Options

Open hyoj-seo opened this issue 9 months ago • 1 comments

Hi, I'm using your program to perform demultiplexing with linked indices.

While testing different options to determine the best settings for my index, I came across something I wanted to ask.

My indices are paired as 8 bp forward and 8 bp reverse, so I used the linked index structure as {forward}...{reverse_rc}.

I expected to get fewer reads with -e 0.0 --rc (allowing 0 mismatches) compared to -e 0.2 --rc (allowing 1 mismatches).

However, I found that in some samples, the number of reads actually increased when using -e 0.0 --rc.

Additionally, when I used -e 0.0 --overlap 8 --rc, I consistently got fewer reads per sample compared to -e 0.2.

From my perspective, the behavior of -e 0.0 --overlap 8 --rc seems more reasonable. But I'm wondering — what's the exact difference between using -e 0.0 alone and using it with --overlap 8, and why might I be seeing this kind of result?

Thanks in advance. HJ

hyoj-seo avatar Apr 07 '25 07:04 hyoj-seo

Hi, can you please clarify how your reads look? That is, where exactly are the indices in the read? Do the reads start with the index sequence? If so, you should use an anchored adapter, that is, add ^ at the beginning of the index sequence.

For 8 bp long indices, you definitely should use the program in such a way that only full-length matches are allowed (no partial matches). You can achieve that with --overlap 8 or by using an anchored adapter type (which implicitly prevents partial matches).

Do you use -a or -g to specify the linked adapters?

I expected to get fewer reads with -e 0.0 --rc (allowing 0 mismatches) compared to -e 0.2 --rc (allowing 1 mismatches).

However, I found that in some samples, the number of reads actually increased when using -e 0.0 --rc.

This is strange and I don’t have an explanation right away.

marcelm avatar Apr 09 '25 07:04 marcelm