BRAKER
BRAKER copied to clipboard
prothint seems to hang on some records
Dear,
I succeeded a very similar first run with docker (ONT assembly all the rest the same). The ONT run ended after few hours and gave results.
The second run hangs on some prothint record (PacBio assembly) I stopped the first attempt after 2days hanging and restarted fresh in a new folder and it hangs again at the same point.
using teambraker/braker3:latest; v3.0.6
image="teambraker/braker3:latest"
# get database for this sample type from https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/
# wget 'https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/Viridiplantae.fa.gz' && gunzip Viridiplantae.fa.gz
orthodb="Viridiplantae.fa"
# using orthodb proteins only
docker run \
--rm \
-it \
-u "$(id -u):$(id -g)" \
-v $PWD:/data \
-v $AUGUSTUS_CONFIG_PATH:$AUGUSTUS_CONFIG_PATH \
-e AUGUSTUS_CONFIG_PATH=$AUGUSTUS_CONFIG_PATH \
${image} \
braker.pl \
--species=${species} \
--useexisting \
--genome=/data/${outfolder}/${asm} \
--prot_seq=/data/${orthodb} \
--workingdir=/data/${outfolder} \
--threads=${nthr}
some of my terminal output (full log attached)
braker_firstrun.log braker.log
#**********************************************************************************
# BRAKER CONFIGURATION
#**********************************************************************************
# BRAKER CALL: /opt/BRAKER/scripts/braker.pl --species=Chlamydomonas reinhardtii --genome=/data/pacbio_results/pacbio_draft_assembly_softmask.fasta --prot_seq=/data/Viridiplantae.fa --workingdir=/data/pacbio_results --threads=48
# Tue Nov 28 08:56:41 2023: braker.pl version 3.0.6
# Tue Nov 28 08:56:41 2023: Only Protein input detected, BRAKER will be executed in EP mode (BRAKER2).
# Tue Nov 28 08:56:41 2023: Configuring of BRAKER for using external tools...
# Tue Nov 28 08:56:41 2023: Tryin
[braker.log](https://github.com/Gaius-Augustus/BRAKER/files/13489415/braker.log)
g to set $AUGUSTUS_CONFIG_PATH...
# Tue Nov 28 08:56:41 2023: Found environment variable $AUGUSTUS_CONFIG_PATH.
# Tue Nov 28 08:56:41 2023: Checking /opt/biotools/Augustus/config as potential path for $AUGUSTUS_CONFIG_PATH.
the current command is:
# Tue Nov 28 09:21:13 2023: starting prothint.py
/opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=48 --geneMarkGtf /data/pacbio_results/GeneMark-ES/genemark.gtf /data/pacbio_results/genome.fa /data/pacbio_results/proteins.fa
...
[Tue Nov 28 09:43:09 2023] Enqueueing pair 246182/248920 (98.9%). Est. time left: 00:00:13 (hh:mm:ss)
[Tue Nov 28 09:43:11 2023] Enqueueing pair 246431/248920 (99.0%). Est. time left: 00:00:12 (hh:mm:ss)
[Tue Nov 28 09:43:12 2023] Enqueueing pair 246680/248920 (99.1%). Est. time left: 00:00:11 (hh:mm:ss)
[Tue Nov 28 09:43:16 2023] Enqueueing pair 246929/248920 (99.2%). Est. time left: 00:00:10 (hh:mm:ss)
[Tue Nov 28 09:43:16 2023] Enqueueing pair 247178/248920 (99.3%). Est. time left: 00:00:09 (hh:mm:ss)
[Tue Nov 28 09:43:18 2023] Enqueueing pair 247427/248920 (99.4%). Est. time left: 00:00:08 (hh:mm:ss)
[Tue Nov 28 09:43:19 2023] Enqueueing pair 247676/248920 (99.5%). Est. time left: 00:00:06 (hh:mm:ss)
[Tue Nov 28 09:43:20 2023] Enqueueing pair 247925/248920 (99.6%). Est. time left: 00:00:05 (hh:mm:ss)
[Tue Nov 28 09:43:21 2023] Enqueueing pair 248174/248920 (99.7%). Est. time left: 00:00:04 (hh:mm:ss)
[Tue Nov 28 09:43:22 2023] Enqueueing pair 248423/248920 (99.8%). Est. time left: 00:00:03 (hh:mm:ss)
[Tue Nov 28 09:43:23 2023] Enqueueing pair 248672/248920 (99.9%). Est. time left: 00:00:02 (hh:mm:ss) # hangs here
any idea what this could be and how to circumvent it, here are the running jobs (with 100% cpu on one thread)
u0002316 55913 0.0 0.0 17504 12368 pts/0 S+ 10:21 0:00 python3 /opt/ETP/bin/gmes/ProtHint/bin/prothint.py --threads=48 --geneMarkGtf /data/pacbio_results/GeneMark-ES/genemark.gtf /data/pacbio_results/genome.fa /data/pacbio_results/proteins.fa
u0002316 64582 0.5 0.0 3777148 481632 pts/0 Sl+ 10:23 1:38 perl /opt/ETP/bin/gmes/ProtHint/bin/run_spliced_alignment.pl --cores 48 --nuc ../nuc.fasta --list /data/pacbio_results/diamond/diamond.out --prot /data/pacbio_results/prot0cu2zsy0 --v --aligner spaln --min_exon_score 25 --longGene 30000 --longProtein 15000
u0002316 1420444 0.0 0.0 7492 3976 pts/0 S+ 10:42 0:00 bash /opt/ETP/bin/gmes/ProtHint/bin/spalnBatch.sh batch_2233 batch_2233_out 25 0 30000 15000
u0002316 1434514 99.9 0.1 1242452 763540 pts/0 R+ 10:42 295:23 /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/spaln -Q3 -LS -pw -S1 -O1 -l 23802 nuc_223256 prot_223256
u0002316 1434515 0.0 0.0 1992 4 pts/0 S+ 10:42 0:00 /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/spaln_boundary_scorer -o nuc_223256_prot_223256 -w 10 -s /opt/ETP/bin/gmes/ProtHint/bin/../dependencies/blosum62.csv -e 25 -x 25
Thanks in advance
Hi @splaisan,
It looks like this ProtHint error occurred https://github.com/gatech-genemark/ProtHint/issues/14.
A quick fix is to remove the protein in prot_223256
from the input protein set.
I will look into patching ProtHint to fix this, but that may take a while.
Tomas
Hi @tomasbruna ,
I found the suspect file in the local spaln subfolder in my output folder and can easily remove it but how do I restart the docker run without recreating it?
cat nuc_223256
>6434_g
CGGCAGGTCCCAAGGAATCGGCAGCCCTGGCAGCTGACATCCTAGCAGCGGGCGGCTCTTACGTGGAGGCCCCGGTGCTGGGCAGCCAGCCTGAGGCGGAGAAGGGCACCCTGCTGGTGATGGTGGGCGCGGAGGCCGACCCCCGGGAGCCCGGCAGCCCGCACCACGACACCGTGTGGCCGCTGCTGCGCGCGCTGGGCCAGGAGTCCAACATCCACTTCATCGGGCCGGTGGGCACGGGCGCGGCGGTCAAGCTGGCGCTCAACCAGCTCATTGCATCGCTCACGGTGAGAGGAAGGGTAGGAGGGGGGAAGATAAGGAGGAGCCGAACAGTTGGGCGCTTGGAGGTTTGGGGATTATGGCCAGGCTGAAACGGGGTTGCTGTTTGTGTCGATGCCCCGGTTGCCCGACTCCTGCCTCTGCGCCCCCTGCCGCCTTGTCCCACCTACCCTTGCAGGTGGGCTTCTCCACCAGCCTGGGCCTGGTGCAGCGCAGTGGCGCTGACGTGGACAAGTTCATGAGCATCCTGCGCGCCTCCGCACTGTACGCACCCACCTACGACAAAAAGCTGCAAAAGATGCTGGACCGGGACTACGGCGCCGCAAACTTCCCCACAAAGGTGTGTGGACAACGAGACGCGCAGAGGCATGCAACCTGTCGAGCTCTTATGTCCACGCATTGTAAACTTAGCCAGCGACATCAACTGCGGATAAGCTCAACCGTGCCCGCACCTTGGTCCCCTCGTCTCCCCACCCGCAGCACCTGCTGAAGGACGTGCGTCTGTTTGAGATGGAGGCTGCGGCTGCGGGTCTGGACACGCGCCTGCTGGCGGCGCTGAAGGGCGTGGTGCAGGACACTGTGGACCGCGGCCTGGCCAACACCGACTACTCGGCGGTGTTCGACGCAGTGGCGCACCCGGGAGAGCAGCAGGCGACCAAGCCGCAGCAGTAAAGGAGAAGGCTAGGTGTGCGGTATCTGAGCTGGCGCCTGGGCGCATTGCGGCATGCGCCCAGGCTACCGGAGCACAGCAGAAGTCGCGCGGGAGGTGCACGAGCAGAGCACAGGGCGCATGACACGAGTAGATTGAGCAACGCAGCGTCGAATGATATATGTGCGCGTGGGCGGCGGCGACGGTGGCGGTAGGGTTGGCTCCGTGCCTGCTGCTATGTGCTTACTTTGCCCGCATCAGGCACAAGGGTAGATACGATACACCACCTCGTTGCGATGCAATATGCGCATGGGGCTGCCGCCCTGCAATACGGGTGGGAGATCAAGGGTCATCACTCATCAGAGTGCGTAAGCACGCTGAGAAGGCCAGAAGGGGCATGCACTCCAGAAACGCTTTGCTGGGGCGCGTCAGGCAGCATGTCACATGCCTCGCCGGAACGTGGGCTGCTGGGGCTCTGAGTCGAGGCGAGTTTTGCGACAGCAGTAACCTAGGCTAACCTCTATCAAAGGTGCCCGCAGTGCAGTGGCGGCGTAGTAACGTGGGTAGCGTGCGTGGGTAGTAAGCACACGGTTCTTCACTCCTGGGCGGTTTTGTAGCATTGAACTGTCCAGTGCAGGCTTTGAGTCCGCAGCAAATTTCCAAATGATAAGCTGCGTTTTCCGCGACGAGCTTCTGCAACTTGTGAGTGGTCCTTTGAGTGGTACTTCCTCGATCAAGGCATAACCTGGATAGCAAAACGCTACCAAGGTGCTTTCAAATATAAAGAACCACCTCATACGAAAACGCCGGCAGTCCCAAGAATTCATGGCCGCGAAACCACAATGCGTGCATGACACATACATCCCCCGCTCTTCGTTCATTTTTTCGACCTAAGAACACTGGAGTTCCAGTCAATAACGATTCAGAGTTCAACGCATGCACAAGATTTCGCCATGCACAAAGCCAGCTAACGCTGTTCTGCGCTTCTAATTACAATGTGTCGTCACTGACTGCTATCGAGGGACTCGCTGTTGCGTTTTTTACAAGGAATAATCTGCTTGAGTCGGACGCAATGAAGGAGGTGCGTTGGGGAGCAAAGGTGGGGGCGTTTTCGACAGAAGGGTCCAGGGCCAAGGCGTCGACCTCCCCGTCAGGTTTCATTCTAGATTTGCCAAATATTGCCAAATATGCCAAATATTGCCAAATATTATGATAATGATATTTGCATCGCACCTGTGCTAAGCGCGTATTGACGAGCGTGGGCGAAGCGTTTGTCAGCGGCACCTATGCACAGACCCGGCGTGCATACATTTGCAATAGGACTGCTTATCATCTAGATAAACATTTCCACCCACGGGTGTCACCGAGGATGGCCCTCGTCGCTTTTGCTTGTCGCCAGCCCTGCCGGGCTGCTTGCGCTGCTGTGGCCTTTCCCTGGCCACACTCTATATTGTGTTCCAAGCATATCTTGCATTGCGAACACCAGTTGAGAAACTTGCCGAGCCCGCTGTCAACACCCGCTCAACTGCCCCAAACTTTACGCGACCGCGACAAGCACTGAAGATTAAAGACCACGCTGTGGAGAAGTCCTCCGGGGCTTCAAAATATGGCGCCGTGACATGCTGGTTTCTGTTCGGGCGTCCGCCCAATTCAGCCCTTGAGGTACACTCACGGCTCTGCCTGCGAATACCCATCCACACAGAGGTACGCGGTCATCGCTCCGCTGGGTTCCGGCGCCTATGGCTGCGTGTATAAGGTATGTGGAACAGGCTGGCAGCAGTGTGAAGCGGGGACCTTGGGAACAGCGGTGCCCCGTGGGAATTGGCGGTGTGACCTTGGGTGCGACGGGACAGGGTAAGAGGCAGGTTGGCGTAGCCCCGTGGCTGATATAGCTGGGTCCCGAGAAACAAGTTACGCCCAACCCAGGCGCCAATGAGACATGGAAATACGTCGCCTCCGTGAGAATCGTGGGGTGAGAGACACACTCATGAACACGCCTCCCCTCCTCTATCCCTGTAGTGCCTGGATCGCGACACGGGCAGCCTGTGTGCGCTCAAGGTCATCAACCTCGCACATCAGGAGCCCGCGGTGAGTTCCAAGCACCACAGCTGCACCAGTCAGTTCTGAGTGCGGGGCCACGCGGCTGGCCCAGCTGCCCAGCATCGCAGAGGCGTGTATGGCATCAGTATCCTTGGTCACCGGCATTCCTGCGATACAGCGTTAAACTCCCCATCACGTGTTTACGCTGGTGTCGAAACTGCTGACGTGCCTGTGTGGGTGGGCGCCGGTGCACAGGTCATGCGGCTTACCATGCGCGAGGTACGCACGCTGCAAAAGCTGCCAAAGCACCCGCACATTGTGGAGTTGAAGGATGCGTTCAAGAGCTCGGGCAGCGGCCGCGTGTTCCTGGTCTTCAGCTGCGAGGGGCGCAGCATGCATGAGGTGCGCGATCGCGGGGCAACGCGTGAAGGGCGGACGGGAACCTTCAGCTGTTGACTTCCCAAGGCCCCTCCAGGCTGCCCCTGTCAGCTTGACTTACTGACTGAGCTGTATGGTATGCCGCACACTCGCGCTCCGGCAGGAGGCGGAGAACTACGCCAAGTATATCCTGCCGGGGCCCATGCTGCGCCAGGTGGCGTGGCAGTTGCTGCAGGCGCTGGCGCACATACACGAACACCAGGTGCGTGTGTCCAACCGAGTATGTGCAAGGCGCGTTCGTGTGACTGGCGGTCTGTCGGGGCGCGTTGTCTCCAAGCCCGGGCGTATTTCAGAGTCCTGCTGACCGCGCGCCCACCACCGCCCACCACAACCCTCGCGCGTGCATCCGCACGCACCCAGATTATCCACCGTGACGTCAAGCCCGGCAACATCTTGCTGGTGGGCGACGGCACCGGCGGCGCGGCGGGCGTGGGCCTCAACGGCGCCGACGTGCACATCCGGCTGGCGGACTTTGGCTTTGCCCGCAGCTGGCAGCCGCACGAGGCGTTGTCCTCCTACGTGGCCACGCGGTGGTTCCGTGCGCCAGAGGTGGGTGCCGATTTCGGTTTTGATGGTTCTTGTCAAGGTGTGGCTTGCTGGGGCGCATGAGGTGGTTCGGTGGCAGCTGATCCAGCGCGGTGGGCCGTGTCTCGGGTGGGTGAGCGTTGCACCTGCGGACACCGCACGCTAACCTCCGCACGCGGGGCCTGCGGTCGCAGATCCTGGTGCGTGGCAAGTACAGCTTCAACAGCGACTGCTGGAGCGTGGGCTGCACCATTGCCGAGTGAGTCGCGCGTTGGGGCTTGGGGCTTGGGGCTCGGGCCATGCACTGCCTTGTCGGCTGAGAGGGTACGGTATCCAATCGCGTAGCTGCGCGAGGGCGTGGCGGCACGGTCTCGAACCCACGCACGGCCACCGCACCACTGACGCCCGGACGCTCCCTTCCACTGCCTTCGTTACGAGCAGGCTGGCGGTGGGTTCGGCCCTGTTCCCTGGCACGTCCACCATCGACCAGCTGGCCCGGATCATGCGCGCCACGGGACCGCTGCCGCCCTCGTTAGCGGCGCAGATGATGTCGGACCGAACTCTGAGCCCGCTGGCGGCGCAGCAGCGGCGGCCGCCGAACCGCACCCTGCGCGAGCGCCTGCCGGTCGAGGCCCGACTGTTTGAGTTCCTGGCCGCCTGTCTTCAGGTGGACCCGGCCCGCCGGCCCAGCGCCAAGGAGCTGATGCAGATGCCGTACTTTTGGGACATCGTGCCGCGCAGCCGTGCCCTGCCCAAGGCCTCCATGGAGGCAATGGCGGCCGCACGTGACGCCGCCGCCGTGCAGATAGCGGCGGCTGAGGCTACCATCGCAAAGCCGGCGGCGCAGCCGGCGGCCGTGGCTGTGGCCGCGCCCGCGGCGGCGGCTCGCAAGGACGTCGTGCAGGTGGAGGCCAAGGGTGCGGCGGCCGCGCCGGCGGCATGCGGCGCGGTAGCGGGCGCAGCTGCCAAGTCCAGCGGCACGGACAAGGCGGCGGCCGGCGGTGCTGGGGGCCAGACGGCCTCGAGCAGCGTGGCGGCACCCATGACTACCACCCGTACTGCAAGTGAGGCCCAGGCCATGAGCCTCTCGGCCGTCGCTTGCTGCCCGGGGACTGACCGCGCGTCGACAGCGGTGCCCCCTACGGCGCCGGCGCAGCTGGCCGCTGCACCTGCTCAGGGCACAGCAGCAGGGCTCAAGCCTGCAACCAGCGTGGTGATCTCGGTGAAGGCAACTGCTGCGTGCGGCCGGGACCAGCCAAGCGCGCCGATGACTGGCTCGAGCCTCAGCACCCGCGACCTCGCGAGCATGAATCCCGCCGCTATGCCAGCGCCTGCCAACTCACAGGGCAGTGGTGTGACATCGGTGCCAGCATCTCAGGCAGCGGAGCAGGCGGCCGCCGCTCCCTCAGCGGAGCCGCCCCCACGCGTTGTGTGTATGCCCGACCTCACCAGCGTGAGCACCTTGGCATCAGGTGCGGCGGGGCCGCAGCCCGCGCAGCCGGCGCGGGCGCGCGCGCCGGCACCGGTGGCGGACGCGTCGCCGGAGGACGCGTCGCCCCGGCAATCCAGGACTGAGCGCGAGCTGCAGCGGCCACAGGCGGCCGTCACGCTTGTCACTAGCTCTAGCCTGTTCCCATCACCGCTGCCAGCACCGCTGCCTCCGCCGCAACCAGTTGCAGTGGAGGCGTCGTCGCCGTTCACGCTCGTGGTTGCTGACACTCTGGGTGGCGCTGCGGCGGGTGCCGCAGGCGCCGCCGCCCCAGGCGTCGCAGGCGCAGTCGGCGGTGACAGCACGCCGCGCAGCCACACCACAGCGCGCATGCTGGACCTGCCCTCCAATACCGTGGAAATGTTCATATCGCCCACCACGTCGGTGGCAATGCATCGGCTGCTGCCAGCTGTGATGACGCCAGTAGGCGCACCGCCGCCGGCCACGCCCAGTGCCGCCGTGCGCTTGCGGCAGCTGATGCCGCACTGCCGTGCGCCGGCGGGCGCGGTGCCGCCGGTCCTGACCTACGGGATGCTGTCACGCAGCAGCACTCTGGAGCTGGACATGACGGGCAGTGCGGCTGCGGCTGCAGCGGTGGCGGCCGCTGGCGTTTGGGGAGGAGATGGAGGAGCGAGCGGGGATGGTTATGGCGTGTCGTTGGCGAATGGGGCCTCAGCGGGGCAGCTGCAGGCCCACATACAGATGCAGCAGCAGGCGGCGCAGCGGCATGCCCCGGCGGCGGCCGCGAACAGGGCGTGGCGGCGGGCGGGTCGCGCGTCGGTGGAGTTTGCAGACCAGCTGTCATGGCCGGCAAATACCAACCAGCCCGACCAAACGGTCAGCGGCGCCAGCACAAGCAGCAACATTTGGGCCAGGGCTGTCACTCCTGGAGCCGGCGCCGCGCGCGTTGGCGGCAGCGGCGGCGCAGCCGCCACTGGCACCCGAAATGTCACTTCCGCCGCCATTATGCGTCGCAGCTGGCGGCTGCTGCCGTACCGCACAACGGGCGGCAGCCCCGGCTTCATGCCCGTGCCCACGCTGGGCGACGAGCCAGCTGCGGATACGCCGTCCCTGCACACGTCAGGCGCGGGCGCCGTCGCGTCGTTGGTCAATGCTGCCGCGGGCCTGGGCCGCCACAACAGCCGCTCGCAGGCGTCCTTTGTCCGCAGCATGTCGCGGATGTCGCAGTGCCACGCGATGCCCTCGGGCGCCCTGGACGTGTCATCTGCGGGCCATGACAGCTCAGTGGACGGCGCCGGCGGCTTTTGCTCCGCGTACGCAATGGCGAACGCATCGGCTGGAGCGACATCCTCGCCACTTGTGGGCCTGGTGACCACGCCGCAGCAGCCGGCTAAGGCGCAGCAGCTGCAGGCACAGCTGCAGAGAAACGGGTCCACAGTCGGCGGCGCCGTCGCACAGTCGCCGCCCATGCTTTACGGCCTGGTGCTGGCCGCAAGCAGCGATTCGCCGTCCCGCACGCGCCGCGCCGCGAGCGCCGTGCTGCCCAGCTTTCCTGCAGCCAGCGTACCGGGAGCCCACGCCACCCCTGTCACGTACACTGGTGCCAGCGCCGCTGATGCCAGCAGCAAAGGCCCCGCCAGCGTGGCTGCAGCAGCAATGGCCCTTCTCCTGCGGTCGTCGTCCCAGCAGCAGCGTGCTGCCACCGCCGCAGGGCATGTGCCGCACGGCACAAGCCGCCTCGCAAACGCCGTCAGCTCCAACCTGTGCGATTACCCCTCTGGGGACGCGGACATCGCGCCCACCGCCGGCACCCCTCAGGCGGGAGCCTCCGCCTCGGCCTTTCCCAGCGGCACGCCGATGGGCACAGCGACCGACTCGGGCGCCGTGCGTCGTGCACTCGGCTTGTCCTGGCAGGTGCTCCAAGCCGTGGGCTGCAGCAGCAACGCCGCGGCAGCTGCGTCCACGGCCTGCTTCGACAGCGCCGCCTCCGCCACCGTCGCAATGGCACAGGCCGGCGCCGTGTCGCTTGACGCAATGCTGGCTACTGGAGGCGGCGATGGCGGCGGCGCCCCTGCAGATTGCGGCCTTACCGCTTCGGCGTCGGCAGTGGCACGCTTCCCCAGCGCTAGCCTGCTCACGGCGGGCGGCGGGGCCGCCAATGGTGCCTACGTGCCCCACGCGATTACCGAGGAAGAGAACGAGCTAGCATACGCGGCAGCAGCGGATGCGTCAGCTGCCGGTGAAGCTATGGGCGCGGGGTGCAGAGCCAAACATGTGCTGGACAACTCGGATGGATGCGTGCGTCTGGCTGGCTCAAAGGACACGGCAGCGGGCATGGCGCACCTGCAGCAGTCTGCCACCACGCAGCATCCCTTGCCTGCGCGCACGGCATCCCCGGGTGGACGCCGCCAGGGCGCACATGACAGCAAGCAGCGGCCAGGGCTGCTTGCCCGTCTCTTCGGCTGTGGCCGCTTTCGCAATGACCAAATTTGAAGCAACATGTGAGACAGGCCGCCGCTTGTCGGATGGTACGTGTTGGCAGATTTGACACGGCGTCGGGCCTCGGGCCCCGGTTGGGGATAGCAGTGTGTTTTTGGGTGTGGCGGGCCGGACCCACTTGACCGTACGGTAATGCTTAGGTACGGAGCTCAGGGTTCAGGCTGTGCACTTGTTTCTTCTTCTGATATGAATGCGACATTGCATATGCAATGAGGTACAATGATATACTGGGTATTTGCTTTGCCTTGGACGTGAATGCAGTAGCCGGACATGGAACATGGGTTTCGACATGACCGTGTGTGTTCGCGGTAGAGTTGTGCACACACACCAGGCTTGCCTAAGGGTGGGCATGGGGATACTTCAATTAACGAAGGTCACGTTTTAGGAGTGTTTTTGGGCGGAGCGGGAGATGAGGTAGACGCTTGCGGCCCCAGACGGGAGGCGTCAACTATCAAGTTGATCCCATTTATTCCATATGAACATGGCTGTAATGATGCGGCCCGTGGAAGTGTGAATGGGGGGCTGTTCCATGGATGGGTGAGTTTAAATGTTCCCGGTCGCAGTGGGCTCTCGTGCAACCAGGTCCGGATTTTGCGCGGTATGGCTAATTGGTCGTGCCGACGTGAACAGGGGCAGCAGTACGTACTGTCCGTTTGTTGCATTAGCATTCATGATTAGGGGAGACCGCAGCATTTTAGCCCTGGGGCTAAGGTTGTTGAGAAAGAGCACCAGAGCATATGGAGATGTCGCTGTACTTCGGACGAGTACGCCTGGAGGCTGAAAGGAACCTTGCTGCGGTTTGTACGACGCAGACAGATGCTCGCACGGTCTTGCAATGCAAGATGACGGTCGAGTCGTATACGTGCCATGATGATGTTGTTTAATGCTTCACCAGTTGACCGATTATCGCTGATGGGCGCTACAGACAGGGAATGTCCTAACATGGACAGCTGCGAGCAGCTCATTGCGCTGGAGTGTGAATGGAGCCAGAGAAGTCTGAGCAGCCTTGCAAATGGAGATGCGCAGTATGCTTGGTGAAGGAGCTAAGCCCTGCATCAAAGGCCGGAGATATTTGGGGTACATGACGCAGGTCACGAGCTTGCGTGCAACCACAAGTGTGGTCGTCCAGCTTTAGATCTGGGGGGCGTGCCAACAGTGACCCCCACGCACGTTGGCCGGAACGTGTGTGTGGGGGGGCTCGGGTTCTAGTTGGCAATGGGTGCAGGCGGTGCGGTCTGTGCGAGGCGGGAATCTTTTACAGTTTGCCCAGGGGCGGCAGCCGCTGCAGTGTTGGCTTGAGAAGCAATGTCTTAGGCATGAGATGGGAAGGGAACATGGGCAGGGAGCTTCGTGACGTGGGGCCGAGTGAAGGACGTACTCTGTGGAGTCTGCGCCTTGGGCTGGTATGCGCTGCTCCCATGAAAGCGCCACAATATGCCATGGGATTTTTGTCTGATGCCTACCAGTAATCATCTATCAAGTTGGGACCTGTACGTCATCTTCTTCCGTCGCTTGGTTGCCTCCATCTGCAGGTGAGCGGCCAAGCACAGCCAGTCACAGCTAGTTGCTAGCGTACACGTTCCAAACACTATCCTACCAGCTGTTGTCCATGCAGCCGTCCGCTGCTGTTGCCTGGCGGAGTGCTGGTGCACCCTGAGGTGTGACCTCCACCTGACCCCTCATCGTACAGCTACCAGTCTCAACCCGCGTGCCGGTGCCACTTTCCCGATGGCACCACAGCGCACCACGCTCTCCTCCTCGTGTCCCTGCCACGGCTGCCAGCAGGGATGCCAGCCCATGGCTGCCAGCCTGCGCAATTTGATGACTGACTGCTCCCTCCGCCCCGCAAATTCGGGTACCTTCACTGGAAAGTGCGTTGCAGTCCCCAGTCGCAGCAGTGCGAGGGTGCCCAAGGTTGCCCATATTGCAGTGCCATGCTTTGGGCTTACTTGAGCCATCATTTCCGGACCATGCTGCACCTGGCTGCATGCATC
cat prot_223256
>3055_0:000816
MENYEYLGDLGSGSYGFVWKCVQRSTGRVVAVKGFKLAHTDKKFLDAAIREVRMLRNATDHPNIIQLLEAFRSSTGRVYMVFEFADKCLSAELHKRFTCGLPAGQTRVVLWQVLAAVAHLHSKKIIHRDIKPGNILMTSDGVVKLCDFGFARLTRGDPYQPDRFSSYVVTRWYRSPEMLVSDLYGAPSDIWSLGCTFAELATGRPLFPGASSLDQLWRIMRCMGPLPPTQAERFAAAATAAGLPEAPPPPPRGKSLWQRLPELDSRLLDLVQACVRLDPAQRPTAVQLMQMPYFHEIPKAIAGSRLEQLYLAIGSGTGYPGSALGRTASARFRQMQQLAAQQKAAGAAAGGAGSGTQPNVASVPAGGSAGVRGLGGSVTVSVMSPEELLASPRGGHATSGSVKRPASVLLSSVAEAVLGEKPSAGDGSGDCSIFPLAPPLPHIPMVDIAMLLSAQQQQQVQPQHQLQQAPLQGSQRYAAASAAAVVPLAATAAAGPSSSRLHSVSSPFKTVPMLPPLQPAPTSGDVVMPAAAAPIIAAAAASAAMSQSPRSSASMSSPSPHPPGTRRQLSGTSPRGAAPAGTASGRNLLAAATAGAGAAAGRQASGRGLPMGGLVGGVAAPESTGGGSSPTAAGVAVAVPPSVRLAHLSSLSPRQRQHLPQLSPLQRQQQSQALPAAATSVAMPPSAFLDAEARGDSLGSGSGGDGEETDDEILAARQGCRRNRQGYERDGSASRLGRNAGGAVPAGAAAMATATGGAAAAAALPPASASIMPVEAHAMPGLGLLEGYDAQDTSDDDEAQVSDDDELMAFYVARKSGGRGRRGGAAGTRATGSRRKVASAAAASTGALTTPAPAAASAAAMSASGAMHGAAAPAAAATKAAAAATRDELIGVALQAAAAVDMATQEMHMAGSTGGGMQPMQMEADAGMSLHVAAATTAAPLRGAHHNGVAAVDAAAPSPALASWPTAAAPAAGIIAASGLGPRAVAAQPPQRPLPHAGIHQQHHGLYGTQGSHHRQTMPRTTGGGGSSRGSTGTGATPVAAGLNRRVTAMVLGTGLEDAVSHASAAANPNTAATGTPASAAVAAASAPAQPRPLAPAALASCSPTPAVTITSAAATPVVAPLPPPPRFPTGAVAKRATVASYLAISQPNGSMAVTSASVLASGTSATVATADAAVAASSGTTVSQPLPVPRSVARGGQGAGMSGGIIVGTDTGGTGPVAGAVRGAATATGLTHMGTGSLPTVGSIGPGLRHHNHATTMGLTLMAPHESGPRGLGGGAAVTPSAAGVHLQGHGPASLPYGRASLPVQGGSYVGFSTGSANRRMLSRQGSTVFNQLMYDALPEIGTPGGAPDVPAGTPPPQRRRAVMSGFTPCRTAAARAAAEGLPAAAAMAAAMGSNTTDLSVAFSPIAVARHEDPLSIGDGHGLERSSVGAAPGFRSVQFGLACGAGGAYPGASAAGGAHRRQASMQMQTAYTASVGIMGAAGSDLGPSAATAIPGGGAAGGRGSGSYHSHASDTGMLMGSSAPVSHAMHPGYGSGSGSMGGSYRWPGQRILVPDQAHGLATATVTAASGPAGGPPVRGGRLPQAVGLAASGSSQQTSGSAASGAGPLGSGTTVGAAAGAHAAAATPGRSRLGSGILGRMSDDPAGGSMLGAGVGAGAGGGGSHGQHPVLVCTADDVHCSSALNIELDGSCSVGNNTGGGNSAGMWGFGPMAGYPAGAGAASGAVIAARAGGGGRSRWLGSGVIDSLPEDREVLHVAGVDDWRLGNSPGIAGGAGSGVGMAELVLGASDHYSSGLPPAPTSGPTLAEVSAAVAGAILAPSSSSAMGFGYKLSPRGQPATIPGQAGLMGLRPKSPAGSLELLRGRTNGHAGQASYGHGPSGLHQAGGALGSPSSPRSPGSGDAPGRPGSAQLPLAGDGSGMRFAANGSPSRAWVTEGCAAGGGTIGAAAVADVGAAAGAAGKLASADKAEKSKWPRAKALLGGKLISSLVKKFKDGVQVSDRK
Thanks in advance Stephane
never mind, I figured out that this protein was well created from the Viridiplantae.fa input after all.
Here is a bioawk command to zip it for other having the same issue
mybad="3055_0:000816"
bioawk -c fastx -v header="${mybad}" '{if ($name != header) print ">"$name"\n"$seq}' ../Viridiplantae.fa > ../Viridiplantae_edited.fa
Same problem occurred with this sequence (It has been stuck for 14h 30min in 5975wx system). I hope this could help solving problem
Command
braker.pl \
--genome=../202_repeat_mask/${ID}_masked_nuc.fasta \
--species=${SPECIES} \
--prot_seq=/data/genome/db/orthodb/odb11v0_all_fasta_no_asterisks.tab \
--GENEMARK_PATH=${PATH_BIN}/GeneMark-ETP/bin \
--PROTHINT_PATH=${PATH_BIN}/ProtHint/bin \
--AUGUSTUS_CONFIG_PATH=${PATH_ENV}/braker3/config \
--AUGUSTUS_BIN_PATH=${PATH_ENV}/braker3/bin \
--AUGUSTUS_SCRIPTS_PATH=${PATH_ENV}/braker3/bin \
--fungus \
--threads=${THREAD} \
--softmasking \
--useexisting
>2090_g
CCTTTGGTCGATCCGGATTTGCAATTAAAAGTTCTGACTCGAATATTACGCAGCGAGTAGCAGCAGGCTTGCGTGGACACTTCTTGCGCAGCAGGAGAGTGAAAGAAATGATTACATCCCAGAACCGTCGAGTTCGGACTCTGACGCGTGCTTGTAATCAACGCATACACACATCCTTCTTAATAATACACAGCGTACACTACACATACTTACAAAGAATTAGTCGAGAGGTGCATTACCAGGGCCTCTGGATGTCCGCCTTCCGAACTGACTCCGTGCAAAGGCCGAGCTGGAGTTTCGTTTCATCTTGAGTGAATATGCTTTCGCTTTCGAAGGGCAAAAATAAAGCGTCACGTACAATCGAGAACCCGTGCCACGGATGTTTGATAACAACGCGCTCTCCAACGCGGAAGCAGACAAGCTTGTAGACTCTGCTGGTGTTCTTCAGAATACGACACGTCAGCTAGACTTGTAAGCACGTACCACTGATGTCACTGAAGCTCGATGCACTGGATCCATGCGTGCTGCTCCTGTCGCTAGTATGCGCCATCCCAACCGCTATCGGTCTATACTGCATGTGTGGTAGACCCCTCGCCAGACCCGAGGACAGAGCAGGCGGGCTAGATGACTTAGTTTGTGTGAAATGTCGTGCTACAGGGGACTGCGTCTGGGATTCAGGAGGTGGAGATGGGATAGACGGTATGGAGCCCTGAGGAGAGCTCGTGTTCGAGTTGAAGTCGCTAGAGCTAAGGTCTCTCCTGGCATTCCCTGTATACGTCCGGCGCAATGGTGACGGAGAAGTGGGTGACGTGATGTCGCGTCCGCGCTGAGGCTGCTTGCCTTTCGACTTTGGGCGGCCATCTCGCACAAGCCCAAGAGCGTCACTTGTCAGTACAGACTGAAGATGCTGCAGTTTCTTATCGAGTGCCTCTTGCTCCTCCAAGCGGCGTTCCTCCTCCTCCGCTTTCTCGGCTTCTTCATCTATTTCAGAGTCCGAGTCTGAATGCGCCGGTGACGATGTCGACGTCTTTATTGGCGGTGGTACATGTCGAGTACCTTGCAGAGTGAGGACGGAGGATGAAGACGTCTTCTGCAGCTTGGATCCTGAATCATAGCCTAACGAGTTCAGTCGCGCACGAATGCCAAGGGGCGTATTCAATCGGCCAGATGAAAGCAACCTGGCACTTCCCGCCGTCCGACCAGACGTCCTTCTGACCAACTCTGGACGATCCTGGCCCTTGACGTCTACCTCAACGGGGCGTGCTGAAGGAGGACCCGTGGCAGGTGCGAAAGGAACTTGAAGATGTTGTATGCCCTTCAAGTCCTCTTCATAGCGTGTTTGTGCTCTGAATAGAAGATAGGGAAGAGGAACTGCTAAGTGTGCTGCGAGACCTTGCCCTTGAATGGTTCACATTGGCCTGTCAGTATCGAGAAATACCAATATAATCAATAATCCATCTCACAATCCGTTCCTCCGCTATCTGAAGCTCTTGAACGCGCAATGACCTCCCATAGGATGCTTTCCTTCTCAGCATTCCATTCTATCTAGTTTTCAATGAGTTGGGTGAACGCTATCCAAACATGCTTGTGAATGGACCTACTCGTGGTGGGTTCTCGTAGCCTTCTTGTGGTCGATTGTACGGTAGACGGATGATAATGCGCACGGATGGTATCGCCGATGAAGAAGGCATTGCTAGCATTTACATGCAGGCCTTTCGAATGCCAGTGCTAACAGTGAAAGTTGCGTAATGTTACAAGCCCCGTGTCTCCGTGACTCCGGATTGCCTCACCCCTCCACACCACTCACAAATTGTCTTATTCTGCCTATGCCTCGGGCCTCAGCGCCGATTTTGGTCAAAGTAATCCGGCTTGCCTCAGCCTTTCGACCAAAATTTCGTCCCAAATTATACGGTTTCCTCCCAGACACGTACTGGTGGAATCGCCGGCGGGGATCCTTCCACATCCCACGTCGATCGTTCAACATCCCAACACGAACCACCATGACCGTGAGCTCGACTACTGCAGATGAGGGCGAGGAAACCAAAAACGATGCTCAAGAGCTCGACGAACTACTGGGGAATATGGCTCTCGATCCTGAAAATGAACAGGTCGTTTCAGAAATTGGGGGTGGGCGGAGCTTTCTCTCAAGCGACTATCCCGTGCCAATACAAGTTCTTGTGATCTCCCAGTGGTGCGATCTATCTATGAAGGCAGCAATGATGCAATAAGAGTCTGGAATCCGAACAATTCTGAGAGTACGAGTGCGAGCAACAACTCCGATGGCAAGGTAACGCGCTTCGAGGTTCACTTAAGGTACGCTTAATATTGTTTTCGTTGTGAACGATTGACCAACGTGTATACAGCTTATCGTCCTCACAAGACGGTCCCGAGAGCCGCTTCAAGGTCCTTGTATCGCTGCCACGAACATATCCATCTTCATCTCCGCCACAAATTCAGCTCCTGTCGCGTTACATCGGCGCGTTCAGTGTTGACGCAGACCTCTTCGGAGCAGTCATTCGTACATTCATCTCATCTAGAGATGGCGTTGAATGGCTTCCAGGTACAGAATGCATCTTCGATGGATTAGAGAACATCCGGGAACGCGTTGCTAAGTGGTACGACGAACGCCTTAGTGAAGAAAAGGCTCTGGAACTCGTAAGAGACGACGGAAAGGAAGGGACGCACGAAGACAAGCATCCGACCGACGAGATTGAATCTGTGGACAAATCCTCCAAGGGTCGCAGACCACAGGCTTCTCTGCCCGAGGGCATTGTTCTCCATGTATCGGAGCCTATCGTTGATCGGAAAAGTGTTTTTATAGGACGGGCGTGCCGAATATCTCATCCGTCTGAAGTATGCTTCAAGCTCCTTTTCTCTTTCGCGTTAAATTTTAACTCATCTGGTGTTTGTCCAGGTTGACTCCGTTTTATCGTATCTCGTCGCGGACCGAAAAATAGCTCGCGCTACCCATCCGGTTATAAATGCCTGGAGATGCAAAGTGAACGGAACACTCCACCAAGGTAAGGCTTGTTGCGTTTGTATCTGTCTAGTCCAATGTTGCTTAGATTAATTCTTCCACTCAGACAACGATGACAATGGAGAAAACGCCGCGGGGAGCCGCTTGGCTCACTTACTACGAATTTTGGTAAATGTTGCTCCTATGGACGCAGTCATCAATTCGCTCACTACGCATTTTGAAGGACGTTGATAATGTCCTTGTGATCGTCACTAGATCCTTCGGTGGCATCCGTTTGGGCCCCGACCGTTTCAAGCATATTAACCAAGCTGCTCGCAATGCTTTGGAGATAGGAGGATTCTTAGACGCACCAGATGATAAGAAGAATACCTCAAGGCCGAAAAAAAGACACTAAGATCATAGAATGGCTTCAAATACAGTGAAATGGTCAATAAGTAGTACAATGTTCGCGATCGAGCTAGCTCTGAAGTACTGTTTGTCACGTGGCCTGCTTGGAGAAAGATCACTGCTTGGGGCTGTTACACTGACACTGCATTCCCAACTCATGACTCCGTTCTGTGCTGTGTATTGCCGTCCACGTCGAAGCCACAGCTGAGATCGCTGACTCAGTATGATATGTAGGCTGTGACAAGTGTATGCATCTGGTTTGCTTCTTGTACGCATAAAACATCTAACGTGGCCAGGCAAGACAAACAGGTCATGTTTAACAACTCTTACCGCGTTCATGAGCTAGCTGGCAAACTCGACAATACATATCAAGATTCTCGCGAACCTCTGGTCCAGAAAAGCCTGGGTTCTGAGCCATGCCTTTTGGCGAGGCTGCCTTCAACCAGTGAGTATTTTCTCAGAGACATTCAGACCTGCAGAACGACGTCTCACGTTGCGTATAATTACCAGGTTTCTTGGTGCGAGAAGGAGAAGCTCTTGCTCCGAGGGCAGTTCTCAAGCAGCTAAGACGTAGTCCGTCACCACCGACGTCCCCGACGGATACTGATGAAGCCGAGAGACAAGAGCCAAGTAAGTCGAAGACACAGGAGACGGAAAGGAGATCATGGTTTTCTCCAAGAACTGTTAAACGTCCTCAGACCACGTGGAAAGAGCCTCAGGTTTGTACTTATGTGACTTCCGCAGAAGCAAGTGCCTAACAAACAGGGTGTTAATATAGTTATATGAGGTCTTTCGTGCAATTGAGCGGAAGGACATCATGTTTCTCATGGAGGTACGGGATCGAGCATTTCATGTAAGCACTTCTCAGGGCATTTTTATTTGCTATCTCTGACTTTATCGCAAGCTTTTACTCAAAAAGAGTGGGGATGCGACGCCACTCGTACACGCTATGCGGATAGGCGATTCACACCGTGACGTCGCAATTATCATTCTCGGTGCCTTGTCGCGATGGGTTAACCATTTGGAAGACAGCGACATGGCCGACAAGCGAACGAAACCGTTACTCAAAGCTTTGCGTGAGCCATCTCCACTTTACTTTTTACTGGTGAACATGTATATTCACAGACGAAGCAAGGCACCAATCTGAAACTCGCCGTTGACTATGGCCTGCAGCGCTCGCAATCGGACCTCATCCCTTCTTTCATGCAGACCCTGGTCATGAGTGAGGGTGAAAGATGGATTATCGATCAGACGCATAACGTGGCACTTGCACTTCGTGCTGGTACAGAAGGGAAACCTGTTCATACCGCTGAAACTGTTGTTAGGAAGTTCGCGACAAGGGAGCTTGGCAAGGCCGAGCTCATAGCGTCGTTAGAAGATTAGTAAGTGTTTAGCGCATTTATTCACGTTACCGACTCCTCAATGTTTGTCTGCAGCATAGCCAATGCCACTGCAGATCTGTTAGTCCTAGCCGCCTGCTCATGTGTCCTTGATTCTGTTCAAGCGGAACCTATTCCGGTGCGAGTCATTTATGACAGACTTCGGGTTTGATACTTACCATCGGCCAGACGTATTACTTTGCACGAGACACAAGAGTTTTCAATGCTTTCCAGGAACGTCTACAACATCACAAGGGGGCTCTGATGGGTCTTAGCAAACGCCTAAGGTGGCAGATTAGGGTCTTGGAGCACGTACTAGAAGGGCGGTTTAACTCATTCAGGGTATGCATTGCCTGTATTACAGATTATATGGGCTGATCCTTTCTTTTTTTTCTTTCCAGAAAAAGGTCGAGTTGTTGGCCTATGAGCTAGACGAGGGTCCAGGAGTATGATAATTCTGTGCTTTCAAGATCATAATTTTTTGGCCATTCTATGAGCAAGCTGACCGAGTTATTCTCCGTGCTGATTGATGTACACAGCACTAGTAAGAGTCTGAGAGCTCCCCAAAGAATTTTGCAATATCACGCTCGTATAAGGTGCCGCACGAGTAAAAGTTCTCAACTCGAAG
>93625_1:000408
MRRSPSPSTADADNTDYALELQDFLAELSQDPEREAVASEIQVLQSIYGDDAIRLWRPPLKNGKRSASTSRRDGTIRYEVLLSLSSPHDDVSLKVLVSLPETYPKSSPPQLQLLSKYIGSFGADANLFGSILRTYISVSGVEWLEDTVCVFDGLQNVLDRCVSWYEDRLSAEKAGELVRDDGKEAVAVSTRPVSPTGQTNAEISGIADSAPAPVPNALPIGIHIYVAEPITDRKSAFVGRACRIHHPSETRFMCAELFAFKVPLILSHLMSDRRISRAAHPIINAWRCQVDSVLHQGSSHNDDDGETAAGGRLAHLLQILEVNDVLVIVTRYFGGIHLGPDRFKHINQAARNALDLGGFLDAPENKKNTGRVKKH