fastp
fastp copied to clipboard
Not detecting the adapter in miRNAseq
It looks like FASTP is not able to detect automatically the adapter at all for miRNA-seq data.
For example, FASTP is not able to detect automatically the adapter in the SE FASTQ file from https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR5087522
FASTP v0.19.6 was run as fastp -i SRR5087522.fq -o test.fq
.
The first 3 input reads look like this:
@SRR5087522.1
TGTAACAGCAACTCCATGTGGAATGGAATTCTCGGGTGCCAAGAACTCCA
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJ
@SRR5087522.2
NAGCTTATCAGACTGATGTTGACTGGAATTCTCGGGTGCCAAGGAACTCC
+
#4BDFFFFHHHHHJJJJJJJJJJJJIJJJJIJIJCBHIJJJJJJJJJJJJ
@SRR5087522.3
NCCCGGCGGCTGGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCGTACG
+
#1=DDFFFHHHHHJJJJJJJJIIJFGHHIJJIEHHHACDFFFFFEDADDD
FASTQC shows that this fastq file contains most likely the Illumina SmallRNA adapter 3', which according to FASTQC's database of adapters https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt is this ATCTCGTATGCCGTCTTCTGCTTG.
According to Illumina official document: https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-09.pdf these are all the Illumina small RNA adapters:
>Illumina Small RNA v1.5 3p Adapter
ATCTCGTATGCCGTCTTCTGCTTG
>Illumina RNA 3p Adapter (RA3)
TGGAATTCTCGGGTGCCAAGG
>Illumina RNA 5p Adapter (RA5)
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 5p RNA Adapter
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 3p RNA Adapter
TCGTATGCCGTCTTCTGCTTGT
I will update the adapter detecting feature in next release, please help to test it then.
Thanks
Ok.
Hey, just wanted to let you know that this doesn't work at times.
I think it would be better that we introduce something like an error rate for matching adapters similar to Atropos. I think they implemented that to overcome the systemic biases and 3' error rates.
Currently, I'm parsing dnapi.py results and feeding it off to fastp for correction.
Hi! I was trying FastP with smallRNA data and efectively, FastP does not detect these adapters.
Indeed, I'm trying to include a fasta file with the adapters to check... and I obtain this error:
ERROR: the adapter <adapter_sequence> can only have bases in {A, T, C, G}, but the given sequence is: adapters_miRNAs_illumima.fasta
I attach to you the FASTA file