cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

how to clip a variable *end* 5' adapter?

Open yfarjoun opened this issue 7 months ago • 5 comments

I have a Library that has a variable length ADAPTER at the 5' end of the read, but unlike what cutadapt expected, the variablilty is at the end of the ADAPTER. So (e.g.) both

ADAPTERsequence and ADAPsequence

are possible reads.

I tried using cutadapt with -g ADAPTER but this didn't work for the second example, so I ended up building a fastafile:

ADAPTER-1 ^ADAPTER ADAPTER-2 ^ADAPTE ADAPTER-3 ^ADAPT ADAPTER-4 ^ADAP ADAPTER-5 ^ADA Because I know that the read will start with the start of the ADAPTER.

I ran cutadapt with --no-indel because it complains that the different sequences are too similar.

Long stroy short: it worked...but a) feels wrong b) I don't like the fact that I need to use --no-indels

So, my questions are:

Am I missing something? and if not, Do you think it would be difficult to include this allowed matching in cutadapt? Thanks!

yfarjoun avatar May 20 '25 19:05 yfarjoun

Yeah, this is actually the only way to do this at the moment. Note that you can omit --no-indels and simply ignore the warning; there shouldn’t be any downsides to it in this case. Also, although it feels inefficient to provide multiple sequences like this, it shouldn’t be that bad because Cutadapt creates an index if you search for multiple anchored 5' adapters, so it doesn’t have to search for the adapters individually.

One reason this feature isn’t implemented is that no one has asked for it, as far as I can remember. But it is also counter to a basic assumption that I made, which is that an adapter in principle always occurs in full, we only don’t see it fully because the read doesn’t extend far enough.

I’m happy to leave this issue open so that others who would also be interested can add their vote, but realistically, this isn’t going to happen for a while. It isn’t that difficult to implement algorithmically; the hardest part for me is coming up with the user interface. For example, I don’t know whether this would be a new command-line option or whether I’d have to add some syntax to the way adapters are specified.

marcelm avatar May 20 '25 19:05 marcelm

Thanks @marcelm for your work and very fast response, as a fellow developer of OSS software I know how difficult it can be to find the time to answer all the questions.


Regarding API: I think that a modifier at the modifiable end of the ADAPTER would work. e.g. ADAPTER# for 5' adapters and #ADAPTER for 3' adapters.


Regarding usage: I'm using cutadapt to remove a known part of a fusion in order to keep the part that it was fused to. And while the fused part should be of a certain length, the reagents involved are not 100% accurate which is why I need the flexibility.


Regarding --no indels: If it were just a simple warning, I'd probably not add the --no-indels argument...but

  1. The warning was quite severe:
WARNING: The adapters are too similar. When creating the index, 346687 ambiguous sequences were found that cannot be assigned uniquely.
WARNING: For example, '<REDACTED>', when found in a read, would result in 24 matches for both ADAPTER-61 '<REDACTED>' and ADAPTER-62 '<REDACTED>'
WARNING: Reads with ambiguous sequence will *not* be trimmed.

and

  1. it was immediately followed by an exception:
Traceback (most recent call last):
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/bin/cutadapt", line 8, in <module>
    sys.exit(main_cli())
             ^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/cli.py", line 1148, in main_cli
    main(sys.argv[1:])
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/cli.py", line 1228, in main
    pipeline = make_pipeline_from_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/cli.py", line 939, in make_pipeline_from_args
    modifiers.extend(
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/cli.py", line 1081, in make_adapter_cutter
    adapter_cutter2 = AdapterCutter(adapters2, times, action, allow_index)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/modifiers.py", line 112, in __init__
    self._regroup_into_indexed_adapters(adapters)
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/modifiers.py", line 132, in _regroup_into_indexed_adapters
    result.append(IndexedPrefixAdapters(prefix))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/adapters.py", line 1503, in __init__
    self._index = AdapterIndex(adapters, prefix=True)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/adapters.py", line 1258, in __init__
    self._lengths, self._index, self._ambiguous = self._make_index()
                                                  ^^^^^^^^^^^^^^^^^^
  File "/Users/yossifarjoun/micromamba/envs/cutadapt/lib/python3.12/site-packages/cutadapt/adapters.py", line 1412, in _make_index
    del index[s]
        ~~~~~^^^
KeyError: '<REDACTED>'

ps. sorry for the redactions. I cannot share the details of the sequence that I'm trimming.

yfarjoun avatar May 20 '25 21:05 yfarjoun

Maybe helpful as a start: The KeyError crash is fixed in Cutadapt 5.1, which I just released.

marcelm avatar May 28 '25 07:05 marcelm

Thanks! Il give it a try.

Message ID: @.***>

yfarjoun avatar May 28 '25 13:05 yfarjoun

The KetError crash is indeed fixed for me in version 5.0 even! thanks!!

yfarjoun avatar Jun 19 '25 02:06 yfarjoun