cutadapt icon indicating copy to clipboard operation
cutadapt copied to clipboard

Adapter Cutting and Demultiplexing

Open gfill88 opened this issue 6 months ago • 1 comments

Hello,

I am using cutadapt 5.0 and Python 3.12.10. Cutadapt was installed using miniforge3.

I have a paired-end TruSeq Illumina multiplexed library (96 samples total). I am trying to remove adapters as well as demultiplex the reads based on inline barcode sequence.

This is the command that I used:

cutadapt -e 0 --no-indels --pair-filter=both \
-g file:adapterA_fwd.fasta -G file:adapterA_fwd.fasta \
-a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-o {name}.1.fastq -p {name}.2.fastq \
R1_001.fastq R2_001.fastq

The analysis runs and generates a R1 and R2 fastq file for each of the 96 barcodes as well as unknown R1 and R2 fastq files. I also get two unexpected files: 1.1.fastq, 1.2.fastq, which I have figured out are related to sequences that also carry the -a/-A adapter sequence. I thought that the analysis would first remove sequencing adapters and then demultiplex the adapter trimmed reads into R1/R2 fastq files corresponding to the barcodes in the -g/-G provided files.

I have gotten around this issue by doing the following:

  1. Remove sequencing primer binding site read through from 3’ end:
cutadapt \
-a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-A GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-o {name}.1.fastq -p {name}.2.fastq \
R1.fastq R2.fastq
  1. Concatenate sequencing adapter trimmed reads with untrimmed reads to prepare for demultiplexing:
cat 1.1.fastq unknown.1.fastq > R1_adapter_removed.fastq
cat 1.2.fastq unknown.2.fastq > R2_adapter_removed.fastq
  1. Demultiplex adapter trimmed reads/remove adapter from 5’ end:

cutadapt -e 0 --no-indels --pair-filter=both  -g file:adapter_fwd.fasta -G file:adapter_fwd.fasta \
-o {name}.1.fastq -p {name}.2.fastq \
R1_adapter_removed.fastq \
R2_adapter_removed.fastq       

Is this the best approach?

Thank you in advance~ Gina

gfill88 avatar Jul 08 '25 19:07 gfill88

Hi, not sure if this is still relevant.

cutadapt -e 0 --no-indels --pair-filter=both \
-g file:adapterA_fwd.fasta -G file:adapterA_fwd.fasta \
-a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-o {name}.1.fastq -p {name}.2.fastq \
R1_001.fastq R2_001.fastq

Since you do not use any filtering options, you don’t need --pair-filter=both. Even if you use filtering options, you usually don’t need it.

Also, -e 0 is very strict. If there is a single sequencing error in the barcode, it will not be recognized and end up in the unknown bucket. I typically use -e 1 for barcodes in Illumina reads.

The analysis runs and generates a R1 and R2 fastq file for each of the 96 barcodes as well as unknown R1 and R2 fastq files. I also get two unexpected files: 1.1.fastq, 1.2.fastq, which I have figured out are related to sequences that also carry the -a/-A adapter sequence. I thought that the analysis would first remove sequencing adapters and then demultiplex the adapter trimmed reads into R1/R2 fastq files corresponding to the barcodes in the -g/-G provided files.

Yeah, Cutadapt attempts to do demultiplexing based on all adapters, including the -a and -A ones. Demultiplexing needs an adapter name, and since you did not give one for the two -a and -A ones, Cutadapt uses an auto-generated name, which is just "1".

It is correct that it is simplest to do this in two steps (first removal of sequencing primers, then demultiplexing), but I don’t understand the way you are doing it.

I have gotten around this issue by doing the following:

1. Remove sequencing primer binding site read through from 3’ end:
cutadapt \
-a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-A GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-o {name}.1.fastq -p {name}.2.fastq \
R1.fastq R2.fastq

Because you use {name}, this also does demultiplexing, which isn’t so useful if there is only one adapter for each read. It’ better to just use an explicit file name (-o trimmed.1.fastq -p trimmed.2.fastq).

2. Concatenate sequencing adapter trimmed reads with untrimmed reads to prepare for demultiplexing:
cat 1.1.fastq unknown.1.fastq > R1_adapter_removed.fastq
cat 1.2.fastq unknown.2.fastq > R2_adapter_removed.fastq

This is not needed if you avoid the demultiplexing as I suggest above.

3. Demultiplex adapter trimmed reads/remove adapter from 5’ end:

cutadapt -e 0 --no-indels --pair-filter=both  -g file:adapter_fwd.fasta -G file:adapter_fwd.fasta \
-o {name}.1.fastq -p {name}.2.fastq \
R1_adapter_removed.fastq \
R2_adapter_removed.fastq       

What you did above will work and give the desired result (except maybe that you get fewer reads because you used -e 0), but for reference, here’s the command I would use:

cutadapt \
  -j 8 \
  -a GATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
  -A GATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
  --interleaved \
  R1.fastq R2.fastq \
| cutadapt \
  -j 8 \
  --interleaved \
  -e 1 --no-indels \
  -g file:adapter_fwd.fasta \
  -G file:adapter_fwd.fasta \
  -o {name}.1.fastq -p {name}.2.fastq

This runs Cutadapt twice by piping the output from the first invocation directly into the second. This is more efficient because no intermediate files need to be written. I’ve also added -j 8 to run Cutadapt with multiple threads.

marcelm avatar Sep 09 '25 08:09 marcelm