fastp icon indicating copy to clipboard operation
fastp copied to clipboard

adapters not removed - mismatches insertion/deletion

Open marilyne9 opened this issue 5 years ago • 12 comments

Hello,

I used this option --detect_adapter_for_pe to remove adapters but some adapters were still there in my contigs after assembly. I noticed mismatches that are insertion/deletion in the adapters I am wondering if fastp can recognize mismatches that are insertion/deletion and detect adapters with one deletion?

Here is my command line: ./fastp --in1 JC-0001_S929_L001_R1_001.fastq.gz --in2 JC-0001_S929_L001_R2_001.fastq.gz --out1 JC-0001_S929_L001_R1_001.cleaned.fastq.gz --out2 JC-0001_S929_L001_R2_001.clened.fastq.gz --correction --cut_front --cut_tail --detect_adapter_for_pe

Best regards,

Marilyne

marilyne9 avatar Apr 21 '20 00:04 marilyne9

Could you please paste some reads here, whose adapters couldn't be removed.

sfchen avatar Apr 21 '20 00:04 sfchen

Hello,

I can give you one alignment with a contig and one adapter for example: in the contig or initial read there is a 'C' inserted

NODE_89_lengt 9213 GGCCTCGTCCT 9223 ||||| ||||| adapter2 1 GGCCT-GTCCT 10

marilyne9 avatar Apr 21 '20 01:04 marilyne9

Please give me the reads containing this adapter.

I will debug it with fastp.

sfchen avatar Apr 21 '20 01:04 sfchen

I do not know which read it was originally. There are millions of reads. This adapter was in only in one contig at the end. Maybe I could map the reads to the contig but it is not so easy to identify which read was not trimmed. Do have some ideas why?

marilyne9 avatar Apr 21 '20 01:04 marilyne9

Hello, Here are few reads:

@A00814:76:HNLLGDSXX:1:1101:29369:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT TCCTCCTCCTCCTCCTCCGAATCTCCACTCAAAAAGACAAAGTCCTCCTCCGAGTCCTCCTCCGAGTCCGAACTCGACTCCGAGACGGACGAAACCGCCCGCGAGTCTGCGCCGACCGACCGCACGGCCGACTCAAACTCGCTCAGGTT + FFFFFFFFFFFFFFFFFFF,:,FFFFFF,F:FFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFF @A00814:76:HNLLGDSXX:1:1101:29912:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT GTGCGTACTGACTCTGATCAGAATTGATTACCTGCCATGTGCGTGGGGGAAACTGACAACGGGGCATCTCCTTTCTCCAGAGCGGCGGCGGCAACGACGACGAGGCGCCTGATTCCCTCCTCGAAATGGCCGCCGAGATGGAACGCACCG + F:FFF,FFFFFF,FFF:FF,F::FFFFFFFFFFFF,FFF:FFFFFF:FFF:,FFFFF:FFFFFFF::FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:FFFF:FFFF:FFFFFFFFFFFFF,F,FFFFFFFFFFFFFFFF,FFFFFFF @A00814:76:HNLLGDSXX:1:1101:31539:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT CGCAAGACGACTCACGGGCGAATCGCTATCCGCACGACGGTGTGGTAAAGGTCTGCCAACTCGCCCCGCTTGACCGAGACGGCCAACGGGACTCGGATCGACAACTTGCGGCAGTACGTATCCGGCTGGACTTCGTGGCGCTGTTTCGCG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00814:76:HNLLGDSXX:1:1101:1841:1016 1:N:0:NTGGTTGACT+GGCCTGTCCT CCTTATAGTGTTTATGTAAGTATAAGATAGGTATGTATGGCATGTGATGTAATTATACGTATCTTATATGACACTTCCTACTAATAGTAACGTCGAGTCTTCTTGACCATCGTATAGACTATCATTGCTATACGTTTCTATCGTGAAGGG + FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFF,FFFFF:F

marilyne9 avatar Apr 21 '20 19:04 marilyne9

Hello, Is it what you need ?

marilyne9 avatar Apr 21 '20 22:04 marilyne9

Hello, may I have some help to debug ?

marilyne9 avatar Apr 22 '20 12:04 marilyne9

Hi, I will debug within some days.

sfchen avatar Apr 22 '20 23:04 sfchen

Hi, I will debug within some days.

I had the same problem, can't remove read2 adapters

wfgui avatar May 23 '20 02:05 wfgui

Hi @sfchen,

I have also found that fastp can not deal with "read through" reads correctly.

A diagram from Trimmomatic:

image

Read1 fastq file:

@K00159:45:H4W76BBXX:5:1105:20791:9167 1:N:0:CAACTA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAGAGCACACGTCTGAACTCCAGTCACCACGATATCTCGTATGCCGTCATCTGCCAGAAAAAAG
+
AAFFFJJ7JFFFJJJFAFJJJJJJJJFJJJFJJAFJJAJA-F7JJFFAJFJJJJJJJJJJ<-FJJJJJJJ<JFJFAF-<JFJFJJAJJJFJFJJJFJJFFJ<FAF<FFJFJJ7JJJJJJJJJ-FF-FJAJFJ
@K00159:45:H4W76BBXX:5:1105:12063:9027 1:N:0:CAACTA
CCGACTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAATTTTTTTAAAAATTTACCTCCACAACCGCAAGCACAACGTACAAAGAATTCCGGGCGCACGGGCCCGACCG
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ

Read2 fastq file:

@K00159:45:H4W76BBXX:5:1105:20791:9167 2:N:0:CAACTA
CGTCGTCCGGCTGTACACCTCTCAAGGTGTACTTCTCGGTGGCCGTATCATTATTAAAAACTTTTTTTCAAGCAGAAGACGGCATACGAGATTGAGTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
+
A--77FFJ-<FF-FFF-AF7AJA-JJJ-FFF-<<J-F-FA-<F-AA-<<-<AAA-FF---AAFFAAJAF-7-<F-7AJFA<FJAAAJFAJJF<A-<7AJFF7-FF-<-<-<7FF<7FFFJAF-7<AFAA<J<
@K00159:45:H4W76BBXX:5:1105:12063:9027 2:N:0:CAACTA
CAGTCGGGATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAATTTTTTTCAAGCAGAAGACGGCACACCAGATTAGTTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCACTCAG
+
AAFFFFJFFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJFJJJFJFJJJJJJJJJJJJJFJJJJJJJJJJJJJAJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFFFJFFJJJ

y9c avatar Sep 06 '20 15:09 y9c

fastq can not deal with some adapters in SE mode either. for example:

@K00159:45:H4W76BBXX:5:1103:11931:62171 1:N:0:CAACTA
GAGGACGAGGAGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAACAAAACACGCCTATCGCCGGCGAAGTCCGGAAACAAGCAAAACAAAAGTCCAACCGGGGGAGAGCCTAC
+
AAAFFFJ7JJJAFFFA-AJJJJJJJJJJJJJJJJJJJJJJJAJFJJJJJJJJJJJJJJJJJJJJJJJJFFJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJ-FAJJJFJA7FJJJJJJJJJJJJJJJJJJFFJJFJJFJFJF
@K00159:45:H4W76BBXX:5:1103:19157:62277 1:N:0:CAACTA
GAAAGTCGGAGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAGAACGGTGGTGCCCCGCTGGGAGAGGTTAACCACAACAAGAAAAATAAATGTCGCCACTGAGGCCAAC
+
AAFFFFJFJJJJJAFFJFJJJJJJJJAJJAJFJA<FFJFFJ<JJJJJJJJAFFJFJJJJJJJJJJJJFFFFJFFJJJJJJJJJJJJJJFFJJFJF<JJJJJJJJFJFAFAJJJFFJFFJFFJAAJJJAJJAJJJJJAFFFJJJJFJJJJJ
@K00159:45:H4W76BBXX:5:1103:15270:63032 1:N:0:CAACTA
AGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAATTTTTTTTTAATGTTCGGCGACCCCCCGGGATTTACCCTTCCAAAGTTTAAAAGCCCGCCGACCGGGCGGGGAAGGGA
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF

(Illumina Trueseq read1 adapter)

y9c avatar Sep 06 '20 17:09 y9c

Just thought I'd update this, as it would be nice if sequencing errors in the adapter sequences could be taken care of.

Adapter used for R1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

All of these get successfully trimmed, but it seems that sequencing errors/insertions/deletions in the adapter sequence are not taken care of:

(The second sequence in each e.g. is the adapter sequence without error)

e.g. 1 AGAGATCGGAAG**_**GCACACGTCTGAACTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

e.g. 2 AGATCGGAAGAGCACACGTCTGA**_**CTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

e.g. 3 AGAGATCG**_**AAGAGCACACGTCTGAACTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

EisenRa avatar May 17 '22 03:05 EisenRa