cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

Should I use -a or -g when demultiplexing ONT reads with dual barcodes?

Open ashleyp1 opened this issue 1 year ago • 3 comments

cutadapt 4.9

I have 16S amplicon reads that were sequenced with ONT that I am trying to demultiplex. Each sample was PCR barcoded with a 13 base barcode on both ends, so I expect a read to start with a barcode and end with its reverse complement. I put together a fasta file of all my pairs, some are listed below.

>HL001_FW
ATCCGGTCGGAGA...TCTCCGACCGGAT
>HL002_FW
CTGAGGTGATCAG...CTGATCACCTCAG
>HL003_FW
AGTGTCCTGCTAG...CTAGCAGGACACT
>HL004_FW
ATAAGCAATTCGA...TCGAATTGCTTAT

The problem I run into is whether to use the -a or -g flag. Looking through the documentation I see it used almost interchangeably for linked adapters, but I get different outputs depending on which I use and I'm not sure which is correct. I used the below commands, for reference

cutadapt -e 1 -a file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz

cutadapt -e 1 -g file:barcodes_for_cutadapt.fasta -o trimmed-{name}.fastq.gz reads.fastq.gz

ashleyp1 avatar Aug 09 '24 21:08 ashleyp1

The difference between -a and -g for linked adapters lies in which adapters are required to be in the read, see https://cutadapt.readthedocs.io/en/stable/guide.html#linked-override .

For -g, both adapters are required. For -a, only anchored adapters are required, non-anchored adapters are optional.

The distinction between required and optional is only necessary for linked adapters (the one with the ... in the middle) and determines what happens when one of the constituent adapters is not found.

The rules are like this:

  • If an adapter is required, but not found in the read, the read is not trimmed, even if the other adapter was found.
  • If an adapter is optional and not found in the read, the other adapter may still be trimmed from the read (if found).
  • Anchored adapters are always considered required. (Irrelevant here because you don’t use anchored adapters.)

So if you know your reads are long enough so that you should see both primers or if you want to ensure you only have full-length sequences in your demultiplexed output, use -g. If you want to be less strict, use -a.

(You could also make the first adapter required and leave the second one optional by writing this in the FASTA file: ATAAGCAATTCGA;required...TCGAATTGCTTAT.)

marcelm avatar Aug 11 '24 19:08 marcelm

Thanks for the quick answer! That definitely clears things up for me.

I have a follow up question though, after reading through the documentation more. When demultiplexing, does cutadapt require the complete barcode to be present for it to count? For example, for BARCODE it would identify and trim BARCODEsequence and not CODEsequence. Basically, I want to make sure that I only keep reads with a complete barcode.

ashleyp1 avatar Aug 13 '24 17:08 ashleyp1

To require the full barcode to be present, use an anchored adapter. You can either add the ^ to each sequence in the FASTA file:

>HL001_FW
^ATCCGGTCGGAGA...TCTCCGACCGGAT

or, as a shortcut, add the ^ before the file: like so: cutadapt -a ^file:barcodes_for_cutadapt.fasta.

marcelm avatar Aug 18 '24 06:08 marcelm