BRAKER icon indicating copy to clipboard operation
BRAKER copied to clipboard

GeneMark `make_nt_freq_mat.pl` fails because of empty files

Open Shellfishgene opened this issue 2 years ago • 4 comments

Hi!

My Braker run fails at the GeneMark-EP stage with the error below. I think the cause is that all the files in GeneMark-EP/run/EP_ini/ are emtpy (acc.seq, don.seq, intron.len, parse.ep_ini). How would I find out what the root cause for this is? All other files, for example genemark_hintsfile.gff in the workdir, seem ok, but I'm not sure what to look for. Thanks!

The command I ran:

braker.pl --species=SpeCie --genome=../SpeCie_genome.fasta.smasked --prot_seq proteins.fa --softmasking --cores 32 --workingdir=.

This is the Braker output of the error:

...
[Mon Mar 14 12:47:15 2022] ProtHint finished.
ERROR in file /miniconda3/envs/braker2/bin/braker.pl at line 6739
Failed to execute: perl gmes_petap.pl --verbose --seq braker2/genome.fa --EP braker2/genemark_hintsfile.gff --cores=32  --gc_donor 0.001 --evidence braker2/genemark_evidence.gff  --soft_mask auto 1>braker2/GeneMark-EP.stdout 2>braker2/errors/GeneMark-EP.stderr
Failed to execute: perl gmes_petap.pl --verbose --seq braker2/genome.fa --EP braker2/genemark_hintsfile.gff --cores=32  --gc_donor 0.001 --evidence braker2/genemark_evidence.gff  --soft_mask auto 1>braker2/GeneMark-EP.stdout 2>braker2/errors/GeneMark-EP.stderr
The most common problem is an expired or not present file ~/.gm_key!

All the files in ./errors are empty, also GeneMark-EP.stderr. This is the output of GeneMark in GeneMark-EP.stdout:

# check before the run
# hard_mask is in the 'auto' mode. hard_mask was set to: 100
# creat directories
# commit input data
# prepare input data report
# commit training data
# prepare training data report
# prepare initial model
# find GC of sequence
GC 37
# build initial EP model
error, no valid sequences were found
error on call: {paths removed}/make_nt_freq_mat.pl --cfg {paths removed}/GeneMark-EP/run.cfg --section donor_GT    --format DONOR

Shellfishgene avatar Mar 15 '22 09:03 Shellfishgene

I was meeting same issue when run a genome about 13 M

Fu-Yin avatar Apr 29 '22 09:04 Fu-Yin

My orders is : perl gmes_petap.pl --verbose --sequence=genome.fa --ET=genemark_hintsfile.gff --et_score 1 --max_intergenic 50000 --cores=10 --fungus --soft_mask 1000

Fu-Yin avatar Apr 29 '22 09:04 Fu-Yin

I am also encountering this issue on a fresh install (both git pull from main and via the latest release zip) with the test2 within example/tests. I was able to pass both tests within the GeneMark installation I have, so it doesn't seem like an issue with GeneMark (at first glance at least).

skagawa2 avatar Jun 01 '22 16:06 skagawa2

Updating the dependency log_reg_prothints.pl (in scripts/log_reg_prothints.pl) to the version found in the ProtHint (https://github.com/gatech-genemark/ProtHint/blob/master/dependencies/log_reg_prothints.pl) that keeps the al_score field in the output gff file in the ProtHint step seemed to fix this problem. Maybe someone could fix this problem?

skagawa2 avatar Jun 01 '22 19:06 skagawa2

Thank you, @skagawa2. The log_reg_prothints.pl in the BRAKER/scripts folder should actually never be called anymore since it now comes with ProtHint. But I guess it is possible ProtHint tries to use the BRAKER's version if it cannot, for some reason, access the intended script located in the ProtHint/dependencies folder.

To ensure maximum compatibility, I am updating BRAKER's log_reg_prothints.pl script to the ProtHint version (as you suggested).

@Fu-Yin and @Shellfishgene, I suspect your problem could also be caused by this issue: https://github.com/Gaius-Augustus/BRAKER/issues/49. The solution is described there.

tomasbruna avatar Aug 18 '22 22:08 tomasbruna