funannotate Error with funannotate update

Are you using the latest release? Yes, I am using it.

Describe the bug The information that I received from the program is limited. Here is an example of the command line output:

[Jul 21 06:56 PM]: OS: CentOS Linux 7, 80 cores, ~ 791 GB RAM. Python: 3.7.12
[Jul 21 06:56 PM]: Running 1.8.9
[Jul 21 06:56 PM]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[Jul 21 06:56 PM]: Previous annotation consists of: 5,528 protein coding gene models and 275 non-coding gene models
[Jul 21 06:56 PM]: Existing annotation: locustag=gene-SPAR_ genenumber=5804
[Jul 21 06:56 PM]: Trimmomatic will be skipped
[Jul 21 06:56 PM]: Existing BAM alignments found: Spar/update_misc/transcript.alignments.bam
[Jul 21 06:57 PM]: Running PASA alignment step using 552,888 transcripts
[Jul 21 10:27 PM]: CMD ERROR: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/pasa/alignAssembly.txt -r -C -R -g /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/genome.fa --IMPORT_CUSTOM_ALIGNMENTS /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/transcript.alignments.gff3 -T -t /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta.clean -u /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --MAX_INTRON_LENGTH 1100 --CPU 16 --ALIGNERS blat --transcribed_is_aligned_orient

The log file indicates more or less the same:


[07/21/22 18:56:41]: OS: CentOS Linux 7, 80 cores, ~ 791 GB RAM. Python: 3.7.12
[07/21/22 18:56:41]: Running 1.8.9
[07/21/22 18:56:48]: fasta version=no way to determine path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/fasta
[07/21/22 18:56:48]: minimap2 version=2.24-r1122 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/minimap2
[07/21/22 18:56:48]: tbl2asn version=no way to determine, likely 25.X path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/tbl2asn
[07/21/22 18:56:48]: hisat2 version=2.2.1 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/hisat2
[07/21/22 18:56:48]: hisat2-build version=NA path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/hisat2-build
[07/21/22 18:56:48]: kallisto version=0.46.1 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/kallisto
[07/21/22 18:56:48]: Trinity version=2.8.5 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/Trinity
[07/21/22 18:56:48]: bedtools version=bedtools v2.30.0 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/bedtools
[07/21/22 18:56:48]: java version=11.0.9.1-internal path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/java
[07/21/22 18:56:48]: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/Launch_PASA_pipeline.pl version=NA path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/Launch_PASA_pipeline.pl
[07/21/22 18:56:48]: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/bin/seqclean version=NA path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/bin/seqclean
[07/21/22 18:56:48]: minimap2 version=2.24-r1122 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/minimap2
[07/21/22 18:56:48]: blat version=BLAT v36 path=/soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/bin/blat
[07/21/22 18:56:48]: No NCBI SBT file given, will use default, for NCBI submissions pass one here '--sbt'
[07/21/22 18:56:52]: Previous annotation consists of: 5,528 protein coding gene models and 275 non-coding gene models
[07/21/22 18:56:52]: Existing annotation: locustag=gene-SPAR_ genenumber=5804
[07/21/22 18:56:52]: Input reads: (None, None, None)
[07/21/22 18:56:52]: Trimmomatic will be skipped
[07/21/22 18:56:52]: Quality trimmed reads: (None, None, None)
[07/21/22 18:56:52]: Normalized reads: (None, None, None)
[07/21/22 18:56:52]: Long reads: (None, None, '/projects_eg/projects/jmontanes/EvolutionNanopore/CleanBamFiles/FunannotateFullPipeline/TranscriptomeMapping/FunannotateReads/Spar_def_reads.fa')
[07/21/22 18:56:52]: Long reads FASTA format: (None, None, '/projects_eg/projects/jmontanes/EvolutionNanopore/CleanBamFiles/FunannotateFullPipeline/TranscriptomeMapping/FunannotateReads/Spar_def_reads.fa')
[07/21/22 18:56:52]: Long SeqCleaned reads: (None, None, 'Spar/update_misc/nano-mrna.fasta')
[07/21/22 18:56:52]: Existing BAM alignments found: Spar/update_misc/transcript.alignments.bam
[07/21/22 18:57:14]: Running PASA alignment step using 552,888 transcripts
[07/21/22 18:57:14]: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/pasa/alignAssembly.txt -r -C -R -g /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/genome.fa --IMPORT_CUSTOM_ALIGNMENTS /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/transcript.alignments.gff3 -T -t /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta.clean -u /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --MAX_INTRON_LENGTH 1100 --CPU 16 --ALIGNERS blat --transcribed_is_aligned_orient
[07/21/22 22:27:47]: CMD ERROR: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/pasa/alignAssembly.txt -r -C -R -g /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/genome.fa --IMPORT_CUSTOM_ALIGNMENTS /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/transcript.alignments.gff3 -T -t /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta.clean -u /users/genomics/jmontanes/Funannotate/FullTranscriptsRun/Spar/update_misc/long-reads.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --MAX_INTRON_LENGTH 1100 --CPU 16 --ALIGNERS blat --transcribed_is_aligned_orient

What command did you issue?

funannotate update -f $GENOMEFIX -g $GFF -o $Y \
--nanopore_mrna $dRNA --no_trimmomatic --pasa_db mysql \
--stranded F --jaccard_clip --species "Saccharomyces paradoxus" \
--cpus 16 --no-progress --max_intronlen 1100

If you think that I have to provide additional files I will add them ASAP.

Thank you for your help

Carlos

Jul 22 '22 08:07 JC-therea

Sorry for not completmenting the thread with the pasa logfile the last lines of the pasa-assembly.log show the following:

[Tue Jul 26 10:48:19 2022] Running CMD: /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/scripts/assign_clusters_by_stringent_alignment_overlap.dbi -M Saccharomyces_paradoxus_pasa -L 30.0 > pasa_run.log.dir/cluster_reassignment_by_stringent_overlap.out
// retrieving valid alignments.
// retrieving transcript coordinate data.
Will process 34 alignment assemblies...
Can't write /tmp/250803.1658828903.0.0825657811801506.clusters at /soft/EB_repo/devel/programs/noarch/miniconda3/2022-05/envs/funannotate/opt/pasa-2.4.1/PerlLib/SingleLinkageClusterer.pm line 59.

I hope that helps.

Thank you

Jul 26 '22 09:07 JC-therea

Hi @JC-therea -- I'm not sure what is causing the error. But seems like a PASA issue -- perhaps you can get some help from Brian at the PASA repository.

Was it possible that /tmp is full?

Aug 10 '22 00:08 nextgenusfs

Hi Jon,

That was my thinking that /tmp maybe can not store the amount of data that I am providing. The issue is that some species work but others it does not. I am trying to contact with Braian but possibly is on vacation if I manage to solve it I will post it here.

One possible solution that I found but I don't know if it will work is to assemble the reads and provide them to funannotate but I don't know which will be the best option to introduce assembled reads from an external program... I am thinking in introducing it as --trinity or as --nanopore_mrna but I am not sure... what do you think?

Thank you

PD: the assembly was produced with rnabloom.

Aug 16 '22 09:08 JC-therea

Addendum: I noticed that some of my genomes are hard masked. Curiously, those genomes are the ones that are taking the longest and causing the most problems. Do you think that hard masking could be affecting the pipeline? If so, what do you recommend? maybe removing the N directly from the genomes is too drastic but I am not 100% confident with that... What do you think?

Aug 17 '22 10:08 JC-therea

Hi again, I am writing this for the record. I recommend assembling long nanopore reads because produce better results (at least the UTR regions) and takes less time. That solved the problem in my case.

Aug 23 '22 08:08 JC-therea

I'm not sure about the hard masking issue and how it might relate to PASA assemblies. Generally I always soft mask repeats so that is recommended for this pipeline.

Aug 23 '22 14:08 nextgenusfs