fastp icon indicating copy to clipboard operation
fastp copied to clipboard

Not detecting the adapter in miRNAseq

Open ndaniel opened this issue 6 years ago • 5 comments

It looks like FASTP is not able to detect automatically the adapter at all for miRNA-seq data.

For example, FASTP is not able to detect automatically the adapter in the SE FASTQ file from https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR5087522

FASTP v0.19.6 was run as fastp -i SRR5087522.fq -o test.fq.

The first 3 input reads look like this:

@SRR5087522.1
TGTAACAGCAACTCCATGTGGAATGGAATTCTCGGGTGCCAAGAACTCCA
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJ
@SRR5087522.2
NAGCTTATCAGACTGATGTTGACTGGAATTCTCGGGTGCCAAGGAACTCC
+
#4BDFFFFHHHHHJJJJJJJJJJJJIJJJJIJIJCBHIJJJJJJJJJJJJ
@SRR5087522.3
NCCCGGCGGCTGGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCGTACG
+
#1=DDFFFHHHHHJJJJJJJJIIJFGHHIJJIEHHHACDFFFFFEDADDD

FASTQC shows that this fastq file contains most likely the Illumina SmallRNA adapter 3', which according to FASTQC's database of adapters https://github.com/csf-ngs/fastqc/blob/master/Contaminants/contaminant_list.txt is this ATCTCGTATGCCGTCTTCTGCTTG.

According to Illumina official document: https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-09.pdf these are all the Illumina small RNA adapters:

>Illumina Small RNA v1.5 3p Adapter
ATCTCGTATGCCGTCTTCTGCTTG
>Illumina RNA 3p Adapter (RA3)
TGGAATTCTCGGGTGCCAAGG
>Illumina RNA 5p Adapter (RA5)
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 5p RNA Adapter
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina 3p RNA Adapter
TCGTATGCCGTCTTCTGCTTGT

ndaniel avatar Feb 05 '19 09:02 ndaniel

I will update the adapter detecting feature in next release, please help to test it then.

Thanks

sfchen avatar Feb 05 '19 14:02 sfchen

Ok.

ndaniel avatar Feb 05 '19 15:02 ndaniel

Hey, just wanted to let you know that this doesn't work at times.

I think it would be better that we introduce something like an error rate for matching adapters similar to Atropos. I think they implemented that to overcome the systemic biases and 3' error rates.

Currently, I'm parsing dnapi.py results and feeding it off to fastp for correction.

harish0201 avatar Feb 05 '21 17:02 harish0201

Hi! I was trying FastP with smallRNA data and efectively, FastP does not detect these adapters.

mdtorohernando avatar Feb 06 '24 14:02 mdtorohernando

Indeed, I'm trying to include a fasta file with the adapters to check... and I obtain this error:

ERROR: the adapter <adapter_sequence> can only have bases in {A, T, C, G}, but the given sequence is: adapters_miRNAs_illumima.fasta

I attach to you the FASTA file

image

mdtorohernando avatar Feb 06 '24 14:02 mdtorohernando