orthofiller
orthofiller copied to clipboard
$genome not recognized in reference tdv file
Hello!
I am hoping to use OrthoFiller and am having an issue with my reference species data. When I run OrthoFiller, I get the following error:
Gtf file /path/species_files/genomic.gtf contains coordinates that do not exist in genome file $genome. Please adjust and try again.
I have installed all of the requested dependencies (including those listed in the previous "Issues" post), and have cleaned the headers on my fasta file. I have also double checked the the gtf contig names are the same as those in the fasta file. I am thinking that for some reason OrthoFiller isn't recognizing my genome paths. I'm not sure why, as I have formatted the file as requested.
Do you have any idea why I might be encountering such an error?
Thanks, Zoe
This error suggests that OrthoFiller is recognizing your paths, however, there are some coordinates in your gtf
that are referencing areas in your genome that do not exist.
How did you procure your gtf
?
Thanks, @xonq ! I fixed my gtf with gffread (which seems to do a pretty good job at standardizing gtf files by making them as simple as possible). I had downloaded the original files directly from RefSeq.
@14zac2 yeah, so OrthoFiller requires your gtf
to be perfectly formatted beforehand. that program may be doing it's job, but it may leave discrepancies orthofiller cannot handle.
I have found that the only gff to gtf scripts that work well are the GFFtools_GX. OrthoFiller recommends you use that tool, but furthermore, you can integrate it into OrthoFiller's "safe" script in their utils
folder - simply open their utils script and insert the path to the GFFtools_GX/gff_to_gtf.py
into the variable that asks for it.
if you have a problematic coordinate set, those scripts should flag it and let you know or move on - its the most direct way to get your gff working. if youre in the unfortunate situation (which happens) where your gff does not play nice with those scripts, you will have to edit / create your own or do some downstream filtering until it works.
Thanks, @xonq ! I fixed my gtf with gffread (which seems to do a pretty good job at standardizing gtf files by making them as simple as possible). I had downloaded the original files directly from RefSeq.
i revisited this issue because i ran into the same problem; turns out that orthofiller will pull out fasta headers with spaces... perhaps this is why they recommend cleaning your fasta at the start of the run. did you run the sed
command mentioned in the description to clean your fasta?