fastp
fastp copied to clipboard
adapters not removed - mismatches insertion/deletion
Hello,
I used this option --detect_adapter_for_pe to remove adapters but some adapters were still there in my contigs after assembly. I noticed mismatches that are insertion/deletion in the adapters I am wondering if fastp can recognize mismatches that are insertion/deletion and detect adapters with one deletion?
Here is my command line: ./fastp --in1 JC-0001_S929_L001_R1_001.fastq.gz --in2 JC-0001_S929_L001_R2_001.fastq.gz --out1 JC-0001_S929_L001_R1_001.cleaned.fastq.gz --out2 JC-0001_S929_L001_R2_001.clened.fastq.gz --correction --cut_front --cut_tail --detect_adapter_for_pe
Best regards,
Marilyne
Could you please paste some reads here, whose adapters couldn't be removed.
Hello,
I can give you one alignment with a contig and one adapter for example: in the contig or initial read there is a 'C' inserted
NODE_89_lengt 9213 GGCCTCGTCCT 9223 ||||| ||||| adapter2 1 GGCCT-GTCCT 10
Please give me the reads containing this adapter.
I will debug it with fastp.
I do not know which read it was originally. There are millions of reads. This adapter was in only in one contig at the end. Maybe I could map the reads to the contig but it is not so easy to identify which read was not trimmed. Do have some ideas why?
Hello, Here are few reads:
@A00814:76:HNLLGDSXX:1:1101:29369:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT TCCTCCTCCTCCTCCTCCGAATCTCCACTCAAAAAGACAAAGTCCTCCTCCGAGTCCTCCTCCGAGTCCGAACTCGACTCCGAGACGGACGAAACCGCCCGCGAGTCTGCGCCGACCGACCGCACGGCCGACTCAAACTCGCTCAGGTT + FFFFFFFFFFFFFFFFFFF,:,FFFFFF,F:FFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFF @A00814:76:HNLLGDSXX:1:1101:29912:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT GTGCGTACTGACTCTGATCAGAATTGATTACCTGCCATGTGCGTGGGGGAAACTGACAACGGGGCATCTCCTTTCTCCAGAGCGGCGGCGGCAACGACGACGAGGCGCCTGATTCCCTCCTCGAAATGGCCGCCGAGATGGAACGCACCG + F:FFF,FFFFFF,FFF:FF,F::FFFFFFFFFFFF,FFF:FFFFFF:FFF:,FFFFF:FFFFFFF::FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:FFFF:FFFF:FFFFFFFFFFFFF,F,FFFFFFFFFFFFFFFF,FFFFFFF @A00814:76:HNLLGDSXX:1:1101:31539:1000 1:N:0:NTGGTTGACT+GGCCTGTCCT CGCAAGACGACTCACGGGCGAATCGCTATCCGCACGACGGTGTGGTAAAGGTCTGCCAACTCGCCCCGCTTGACCGAGACGGCCAACGGGACTCGGATCGACAACTTGCGGCAGTACGTATCCGGCTGGACTTCGTGGCGCTGTTTCGCG + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF @A00814:76:HNLLGDSXX:1:1101:1841:1016 1:N:0:NTGGTTGACT+GGCCTGTCCT CCTTATAGTGTTTATGTAAGTATAAGATAGGTATGTATGGCATGTGATGTAATTATACGTATCTTATATGACACTTCCTACTAATAGTAACGTCGAGTCTTCTTGACCATCGTATAGACTATCATTGCTATACGTTTCTATCGTGAAGGG + FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFF,FFFFF:F
Hello, Is it what you need ?
Hello, may I have some help to debug ?
Hi, I will debug within some days.
Hi, I will debug within some days.
I had the same problem, can't remove read2 adapters
Hi @sfchen,
I have also found that fastp can not deal with "read through" reads correctly.
A diagram from Trimmomatic:

Read1 fastq file:
@K00159:45:H4W76BBXX:5:1105:20791:9167 1:N:0:CAACTA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAGAGCACACGTCTGAACTCCAGTCACCACGATATCTCGTATGCCGTCATCTGCCAGAAAAAAG
+
AAFFFJJ7JFFFJJJFAFJJJJJJJJFJJJFJJAFJJAJA-F7JJFFAJFJJJJJJJJJJ<-FJJJJJJJ<JFJFAF-<JFJFJJAJJJFJFJJJFJJFFJ<FAF<FFJFJJ7JJJJJJJJJ-FF-FJAJFJ
@K00159:45:H4W76BBXX:5:1105:12063:9027 1:N:0:CAACTA
CCGACTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAATTTTTTTAAAAATTTACCTCCACAACCGCAAGCACAACGTACAAAGAATTCCGGGCGCACGGGCCCGACCG
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Read2 fastq file:
@K00159:45:H4W76BBXX:5:1105:20791:9167 2:N:0:CAACTA
CGTCGTCCGGCTGTACACCTCTCAAGGTGTACTTCTCGGTGGCCGTATCATTATTAAAAACTTTTTTTCAAGCAGAAGACGGCATACGAGATTGAGTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
+
A--77FFJ-<FF-FFF-AF7AJA-JJJ-FFF-<<J-F-FA-<F-AA-<<-<AAA-FF---AAFFAAJAF-7-<F-7AJFA<FJAAAJFAJJF<A-<7AJFF7-FF-<-<-<7FF<7FFFJAF-7<AFAA<J<
@K00159:45:H4W76BBXX:5:1105:12063:9027 2:N:0:CAACTA
CAGTCGGGATCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAATTTTTTTCAAGCAGAAGACGGCACACCAGATTAGTTGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTCACTCAG
+
AAFFFFJFFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJFJJJFJFJJJJJJJJJJJJJFJJJJJJJJJJJJJAJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJFFFJFFJJJ
fastq can not deal with some adapters in SE mode either. for example:
@K00159:45:H4W76BBXX:5:1103:11931:62171 1:N:0:CAACTA
GAGGACGAGGAGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAACAAAACACGCCTATCGCCGGCGAAGTCCGGAAACAAGCAAAACAAAAGTCCAACCGGGGGAGAGCCTAC
+
AAAFFFJ7JJJAFFFA-AJJJJJJJJJJJJJJJJJJJJJJJAJFJJJJJJJJJJJJJJJJJJJJJJJJFFJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJ-FAJJJFJA7FJJJJJJJJJJJJJJJJJJFFJJFJJFJFJF
@K00159:45:H4W76BBXX:5:1103:19157:62277 1:N:0:CAACTA
GAAAGTCGGAGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAGAACGGTGGTGCCCCGCTGGGAGAGGTTAACCACAACAAGAAAAATAAATGTCGCCACTGAGGCCAAC
+
AAFFFFJFJJJJJAFFJFJJJJJJJJAJJAJFJA<FFJFFJ<JJJJJJJJAFFJFJJJJJJJJJJJJFFFFJFFJJJJJJJJJJJJJJFFJJFJF<JJJJJJJJFJFAFAJJJFFJFFJFFJAAJJJAJJAJJJJJAFFFJJJJFJJJJJ
@K00159:45:H4W76BBXX:5:1103:15270:63032 1:N:0:CAACTA
AGATCGGGAAGAGCACACGTCTGAACTCCAGTCACCAACTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAATTTTTTTTTAATGTTCGGCGACCCCCCGGGATTTACCCTTCCAAAGTTTAAAAGCCCGCCGACCGGGCGGGGAAGGGA
+
AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF
(Illumina Trueseq read1 adapter)
Just thought I'd update this, as it would be nice if sequencing errors in the adapter sequences could be taken care of.
Adapter used for R1:
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
All of these get successfully trimmed, but it seems that sequencing errors/insertions/deletions in the adapter sequence are not taken care of:
(The second sequence in each e.g. is the adapter sequence without error)
e.g. 1 AGAGATCGGAAG**_**GCACACGTCTGAACTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
e.g. 2 AGATCGGAAGAGCACACGTCTGA**_**CTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
e.g. 3 AGAGATCG**_**AAGAGCACACGTCTGAACTCCAGTCA AGATCGGAAGAGCACACGTCTGAACTCCAGTCA