cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

Remove instead of cut adapter

Open EivindStensrud opened this issue 1 year ago • 2 comments

Hi I am wondering if Cutadapt would make an option to only remove the primer adapter sequence, without cutting the sequence. As I am would like to keep the sequence information which lays in front of the adapter I want to remove.

Ex: Either unmerged reads INTERESTINGadapterAMPLICON -> INTERESTINGAMPLICON

or merged reads INTERESTINGadapterAMPLICONreverseadaptor -> INTERESTINGAMPLICON

I am aware of an AWK script could be used, but I think it could be a nice addition to the package.

Regards Eivind

EivindStensrud avatar Jul 06 '23 10:07 EivindStensrud

Hi, thanks for the suggestion. Can you explain a bit why you think this would be useful? I haven’t encountered a situation where that would make so much sense, mostly because I think the resulting sequence would no longer be based in reality, but it would be something artificial. So far, all sequence modifications in Cutadapt remove only a prefix and/or a suffix of the input sequence. Then one you can argue that this only changes one’s "view" of the sequence, but doesn’t actually change it.

The principle that the output is a substring of the input (that is, it is fully described by a start and end coordinate within the original sequence) is engrained quite deeply in Cutadapt, and changing this would require some effort.

marcelm avatar Jul 06 '23 12:07 marcelm

Without going too much into details, I am working with unique molecular identifiers (UMIs), and this addition could potentially streamline UMI based error correction models, to correct for PCR- and sequencing induced errors on independently on every DNA template molecule. With this approach, we can circumvent moving the UMI to the header, remove the primer adaptors, and lastly move the UMI back onto the sequence.

Regards

EivindStensrud avatar Jul 06 '23 18:07 EivindStensrud