BRAKER icon indicating copy to clipboard operation
BRAKER copied to clipboard

Prothint - Error: Inflate error

Open tanpham15 opened this issue 11 months ago • 1 comments

Dear authors,

I got the error at prothint step.

# Thu Mar 21 12:48:30 2024: Calling prothint.py...
# Thu Mar 21 12:48:30 2024: starting prothint.py
/data/scratch/mpx586/github/gene_predict/ProtHint/bin//prothint.py --threads=2 --geneMarkGtf /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/braker1/GeneMark-ES/genemark.gtf /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/braker1/genome.fa /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/braker1/proteins.fa

Here is warning message from beginning

 WARNING: empty line was removed! This warning will be supressed from now on!
#*********
# Wed Mar 20 14:18:30 2024: check_fasta_headers(): Checking fasta headers of file /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/orthodb/Arthropoda.fa.gz
#*********
# WARNING: Detected whitespace in fasta header of file /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/orthodb/Arthropoda.fa.gz. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
#*********
# WARNING: Detected | in fasta header of file /data/scratch/mpx586/Batesia_hypochlora/RNA/braker3/orthodb/Arthropoda.fa.gz. This may later on cause problems! The pipeline will create a new file without spaces or "|" characters and a genome_header.map file to look up the old and new headers. This message will be suppressed from now on!
#*********
#*********
 WARNING: empty line was removed! This warning will be supressed from now on!
#*********
# Wed Mar 20 14:18:30 2024: Assuming that this is not a DNA fasta file because other characters than A, T, G, C, N, a, t, g, c, n were contained. If this is supposed to be a DNA fasta file, check the content of your file! If this is supposed to be a protein fasta file, please ignore this message!
# Wed Mar 20 14:18:30 2024: Assuming that this is not a protein fasta file because other characters than AaRrNnDdCcEeQqGgHhIiLlKkMmFfPpSsTtWwYyVvBbZzJjOoUuXx were contained. If this is supposed to be DNA fasta file, please ignore this message.
#*********
# WARNING: something seems to be wrong with the newline character! This is likely to cause problems with the braker.pl pipeline! Please adapt your file to UTF8! This warning will be supressed from now on!

Note:

  • Inputs: I used a soft masked genome, Arthropoda protein database I downloaded from https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11/Arthropoda.fa.gz
  • Genemark worked, and the output of gmes_petap.pl is genemark.gtf (11.29 Mb)
  • Prothint (version 2.6)

Could you please take a look and let me know how can I solve this problem. Thank you very much

tanpham15 avatar Mar 23 '24 11:03 tanpham15

I found the issue: using "Arthropoda.fa.gz"

Protein need to be unzipped before running.

Please close the question. Thank you very much

tanpham15 avatar Mar 31 '24 20:03 tanpham15