funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

ERROR: FASTQ headers are not properly paired, see logfile and reformat your FASTQ headers

Open Karimi-81 opened this issue 3 years ago • 8 comments

Hi There, I tries to use the RNA seq data to train models using Funannotate pipeline. After preparing the genome (clean, sort and masking), I used the following command: singularity run -B $PWD:/project/funannotate/ -B /home -B /project -B /scratch -B /localscratch funanotate.sif funannotate train -i MyAssembly_masked.fa -o fun --left "ERR21619_1.fastq" --right "ERR21619_2.fastq" --stranded RF --species "Neovison vison" --cpus 32

the program started working and trimmed sequences using TrimmomaticPE; however, after that I got the following error: [06/24/21 10:01:00]: gzip -f fun/training/trimmomatic/trimmed_left.fastq [06/24/21 12:02:13]: gzip -f fun/training/trimmomatic/trimmed_left.unpaired.fastq [06/24/21 12:05:20]: gzip -f fun/training/trimmomatic/trimmed_right.fastq [06/24/21 14:06:10]: gzip -f fun/training/trimmomatic/trimmed_right.unpaired.fastq [06/24/21 14:06:19]: Quality trimmed reads: ('fun/training/trimmomatic/trimmed_left.fastq.gz', 'fun/training/trimmomatic/trimmed_right.fastq.gz', None) [06/24/21 14:06:19]: R1 header: 1 and R2 header: 2 are missing pairing as expected [06/24/21 14:06:19]: ERROR: FASTQ headers are not properly paired, see logfile and reformat your FASTQ headers

R1 header and R2 header were initially "@ERR216198.1 HWI-ST318:237:D0FBLABXX:5:1101:1312:1980/1" and "@ERR216198.1 HWI-ST318:237:D0FBLABXX:5:1101:1312:1980/2" but I had to change them to 1 and 2 because program showed the similar error in the previous runs regarding headers.

Do you have any idea about this error, how should I edit the headers of fastq files to be used by pipeline. Thanks

Karimi-81 avatar Jun 24 '21 18:06 Karimi-81

I ended up doing this, concatenating all R1 files into a single R1 file. and for R2 the same. pigz -cd fw*.fastq.gz |awk '{print (NR%4 == 1) ? "@" ++i "/1": $0}' |awk '{print (NR%4 == 3) ? "+" ++y "/1": $0}' |pigz --fast >FW.renamed.fastq.gz pigz -cd rev*.fastq.gz |awk '{print (NR%4 == 1) ? "@" ++i "/2": $0}' |awk '{print (NR%4 == 3) ? "+" ++y "/2": $0}'| pigz --fast >REV.renamed.fastq.gz Maybe one can combine the 2 awk commands into 1, but this works :)

HenrivdGeest avatar Jun 25 '21 06:06 HenrivdGeest

Thank you. I used the above commands but the issue was not solved: [06/25/21 20:27:12]: Quality trimmed reads: ('fun/training/trimmomatic/trimmed_left.fastq.gz', 'fun/training/trimmomatic/trimmed_right.fastq.gz', None) [06/25/21 20:27:15]: R1 header: 1/1 and R2 header: 1/1 are missing pairing as expected [06/25/21 20:27:15]: ERROR: FASTQ headers are not properly paired, see logfile and reformat your FASTQ headers

Karimi-81 avatar Jun 26 '21 00:06 Karimi-81

Just a dumb question, you sure you removed all the previous output? If so, look at the headers yourself first in the working directory.

On Sat, 26 Jun 2021, 02:32 Karimi-81, @.***> wrote:

Thank you. I used the above commands but the issue was not solved: [06/25/21 20:27:12]: Quality trimmed reads: ('fun/training/trimmomatic/trimmed_left.fastq.gz', 'fun/training/trimmomatic/trimmed_right.fastq.gz', None) [06/25/21 20:27:15]: R1 header: 1/1 and R2 header: 1/1 are missing pairing as expected [06/25/21 20:27:15]: ERROR: FASTQ headers are not properly paired, see logfile and reformat your FASTQ headers

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/610#issuecomment-868895477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARZCFNIDRUHFY3VMVTTGL3TUUNZLANCNFSM47IM637A .

HenrivdGeest avatar Jun 26 '21 08:06 HenrivdGeest

This is a Trinity requirement (funannotate is just trying to capture the error ahead of time and alert you), to use SRA data follow their example: https://github.com/trinityrnaseq/trinityrnaseq/wiki/How-do-I-use-reads-I-downloaded-from-SRA%3F. Otherwise there has to be many issues on TrinityRNASeq page about this.

But the problem in your original naming @ERR216198.1 HWI-ST318:237:D0FBLABXX:5:1101:1312:1980/1 is the space after the ERR accession number. I think if there is a space it parses it as if it were default Illumina header, in which the read number in default Illumina is immediately after the first space. So you might be able to fix your original names with something like:

sed 's/@ERR216198.1 HWI/@ERR216198.1_HWI/g'

nextgenusfs avatar Jun 28 '21 18:06 nextgenusfs

Thank you. I corrected the headers as you suggested and it worked well.

Karimi-81 avatar Jun 29 '21 13:06 Karimi-81

I am sorry to interrupt you again but I got another error:

[Jun 30 05:26 AM]: CMD ERROR: Trinity --SS_lib_type RF --no_distributed_trinity_exec --genome_guided_bam fun/training/hisat2.coordSorted.bam --genome_guided_max_intron 3000 --CPU 32 --max_memory 50G --output fun/training/trinity_gg [Jun 30 05:26 AM]: (None, b'perl: warning: Setting locale failed.\nperl: warning: Please check that your locale settings:\n\tLANGUAGE = (unset),\n\tLC_ALL = (unset),\n\tLC_CTYPE = "C.UTF-8",\n\tLANG = "en_US.UTF-8"\n are supported and installed on your system.\nperl: warning: Falling back to the standard locale ("C").\n') [Jun 30 05:26 AM]: ERROR: Trinity de novo assembly failed

Do you think it is related to header format of initial fastq files? Thank you for your support.

Karimi-81 avatar Jun 30 '21 12:06 Karimi-81

Those locale setting variables look like are problem for perl runs.

Set those LANGUAGE and LC_ALL Variables to en_US And test that perl alone works and not sure if Using a system perl (/usr.bin/perl) or Something else

On Wed, Jun 30, 2021 at 5:41 AM Karimi-81 @.***> wrote:

I am sorry to interrupt you again but I got another error: [Jun 30 05:26 AM]: CMD ERROR: Trinity --SS_lib_type RF --no_distributed_trinity_exec --genome_guided_bam fun/training/hisat2.coordSorted.bam --genome_guided_max_intron 3000 --CPU 32 --max_memory 50G --output fun/training/trinity_gg [Jun 30 05:26 AM]: (None, b'perl: warning: Setting locale failed.\nperl: warning: Please check that your locale settings:\n\tLANGUAGE = (unset),\n\tLC_ALL = (unset),\n\tLC_CTYPE = "C.UTF-8",\n\tLANG = "en_US.UTF-8"\n are supported and installed on your system.\nperl: warning: Falling back to the standard locale ("C").\n') [Jun 30 05:26 AM]: ERROR: Trinity de novo assembly failed

Do you think it is related to header format of initial fastq files? Thank you for your support.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/610#issuecomment-871368523, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL5OYL7X2HITRFCFCQHW3TVMGI5ANCNFSM47IM637A .

-- Sent from Gmail Mobile

Jason Stajich - @.***

hyphaltip avatar Jul 01 '21 02:07 hyphaltip

Hi Jon, I met the same problem when I run funannotate train with SRA reads download from NCBI. Open ERROR: FASTQ headers are not properly paired, see logfile and reformat your FASTQ headers I used following command to fastq-dump sra files fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files file.sra I checked my fastq files and there were not "_reverse" of "_forward" in headers and no blank, please see below. What should I do?

(funannotate) Liangjunmin@dell:/mnt/data/liangjunmin/data/puccinia_triticina/Pt76/rna_seq$ head -n 4 SRR14386296_1.fastq @1/1 CGGTTGTCAAAGTCTTCTCCACCCAGATGGGTGTCACCAGCGGTGGCCTTGACTTCGAAGATACCCTCTTCGATAGTCAACAGAGAGACATCGAAAGTACCACCTCCGAGATCGAAGATCAGAACGTTTCGCTCCCCAGTGGTCTTCTTG +SRR14386296.1 1 length=150 FFF:FFF:FFFFFFFFFFFFFFFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF (funannotate) Liangjunmin@dell:/mnt/data/liangjunmin/data/puccinia_triticina/Pt76/rna_seq$ head -n 4 SRR14386297_1.fastq @1/1 GGAATTGGAGCAAAGGATGCAAGAGTCTTGGGACAGCCATCACATCGACAGCCCTCCTAGTCGAAGCCACCCAACACCCCTTGAAAATATCTCGGCCAGCCGCCGTCGACGGTTCTTGGAGTGCTTTGACTCGAGCGATTCGGAGTCAGC +SRR14386297.1 1 length=150 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFF (funannotate) Liangjunmin@dell:/mnt/data/liangjunmin/data/puccinia_triticina/Pt76/rna_seq$ head -n 4 SRR14386297_2.fastq @1/2 CTCAAGGCCTGGCTTGTAAAGTGTATGACCGGGCCAACGAAATATAAGGTGGGTTACCATAATTTGGGATTCGACGAGATAAATACCTTCTGCTTGAAATCTATTGATGGCCGGGTTACGGTGTCGGTTCCTCTGGCTCACCGTGGGCTA +SRR14386297.1 1 length=150 FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF, Thanks.

jimie0311 avatar Oct 12 '21 08:10 jimie0311