BRAKER
BRAKER copied to clipboard
GeneMark `make_nt_freq_mat.pl` fails because of empty files
Hi!
My Braker run fails at the GeneMark-EP stage with the error below. I think the cause is that all the files in GeneMark-EP/run/EP_ini/
are emtpy (acc.seq, don.seq, intron.len, parse.ep_ini
). How would I find out what the root cause for this is? All other files, for example genemark_hintsfile.gff
in the workdir, seem ok, but I'm not sure what to look for.
Thanks!
The command I ran:
braker.pl --species=SpeCie --genome=../SpeCie_genome.fasta.smasked --prot_seq proteins.fa --softmasking --cores 32 --workingdir=.
This is the Braker output of the error:
...
[Mon Mar 14 12:47:15 2022] ProtHint finished.
ERROR in file /miniconda3/envs/braker2/bin/braker.pl at line 6739
Failed to execute: perl gmes_petap.pl --verbose --seq braker2/genome.fa --EP braker2/genemark_hintsfile.gff --cores=32 --gc_donor 0.001 --evidence braker2/genemark_evidence.gff --soft_mask auto 1>braker2/GeneMark-EP.stdout 2>braker2/errors/GeneMark-EP.stderr
Failed to execute: perl gmes_petap.pl --verbose --seq braker2/genome.fa --EP braker2/genemark_hintsfile.gff --cores=32 --gc_donor 0.001 --evidence braker2/genemark_evidence.gff --soft_mask auto 1>braker2/GeneMark-EP.stdout 2>braker2/errors/GeneMark-EP.stderr
The most common problem is an expired or not present file ~/.gm_key!
All the files in ./errors
are empty, also GeneMark-EP.stderr
.
This is the output of GeneMark in GeneMark-EP.stdout
:
# check before the run
# hard_mask is in the 'auto' mode. hard_mask was set to: 100
# creat directories
# commit input data
# prepare input data report
# commit training data
# prepare training data report
# prepare initial model
# find GC of sequence
GC 37
# build initial EP model
error, no valid sequences were found
error on call: {paths removed}/make_nt_freq_mat.pl --cfg {paths removed}/GeneMark-EP/run.cfg --section donor_GT --format DONOR
I was meeting same issue when run a genome about 13 M
My orders is : perl gmes_petap.pl --verbose --sequence=genome.fa --ET=genemark_hintsfile.gff --et_score 1 --max_intergenic 50000 --cores=10 --fungus --soft_mask 1000
I am also encountering this issue on a fresh install (both git pull from main and via the latest release zip) with the test2 within example/tests. I was able to pass both tests within the GeneMark installation I have, so it doesn't seem like an issue with GeneMark (at first glance at least).
Updating the dependency log_reg_prothints.pl
(in scripts/log_reg_prothints.pl
) to the version found in the ProtHint (https://github.com/gatech-genemark/ProtHint/blob/master/dependencies/log_reg_prothints.pl) that keeps the al_score field in the output gff file in the ProtHint step seemed to fix this problem. Maybe someone could fix this problem?
Thank you, @skagawa2. The log_reg_prothints.pl
in the BRAKER/scripts folder should actually never be called anymore since it now comes with ProtHint. But I guess it is possible ProtHint tries to use the BRAKER's version if it cannot, for some reason, access the intended script located in the ProtHint/dependencies folder.
To ensure maximum compatibility, I am updating BRAKER's log_reg_prothints.pl
script to the ProtHint version (as you suggested).
@Fu-Yin and @Shellfishgene, I suspect your problem could also be caused by this issue: https://github.com/Gaius-Augustus/BRAKER/issues/49. The solution is described there.