orthofiller icon indicating copy to clipboard operation
orthofiller copied to clipboard

$genome not recognized in reference tdv file

Open 14zac2 opened this issue 4 years ago • 4 comments

Hello!

I am hoping to use OrthoFiller and am having an issue with my reference species data. When I run OrthoFiller, I get the following error:

Gtf file /path/species_files/genomic.gtf contains coordinates that do not exist in genome file $genome. Please adjust and try again.

I have installed all of the requested dependencies (including those listed in the previous "Issues" post), and have cleaned the headers on my fasta file. I have also double checked the the gtf contig names are the same as those in the fasta file. I am thinking that for some reason OrthoFiller isn't recognizing my genome paths. I'm not sure why, as I have formatted the file as requested.

Do you have any idea why I might be encountering such an error?

Thanks, Zoe

14zac2 avatar Nov 24 '20 13:11 14zac2

This error suggests that OrthoFiller is recognizing your paths, however, there are some coordinates in your gtf that are referencing areas in your genome that do not exist.

How did you procure your gtf?

xonq avatar Dec 11 '20 03:12 xonq

Thanks, @xonq ! I fixed my gtf with gffread (which seems to do a pretty good job at standardizing gtf files by making them as simple as possible). I had downloaded the original files directly from RefSeq.

14zac2 avatar Dec 14 '20 17:12 14zac2

@14zac2 yeah, so OrthoFiller requires your gtf to be perfectly formatted beforehand. that program may be doing it's job, but it may leave discrepancies orthofiller cannot handle.

I have found that the only gff to gtf scripts that work well are the GFFtools_GX. OrthoFiller recommends you use that tool, but furthermore, you can integrate it into OrthoFiller's "safe" script in their utils folder - simply open their utils script and insert the path to the GFFtools_GX/gff_to_gtf.py into the variable that asks for it.

if you have a problematic coordinate set, those scripts should flag it and let you know or move on - its the most direct way to get your gff working. if youre in the unfortunate situation (which happens) where your gff does not play nice with those scripts, you will have to edit / create your own or do some downstream filtering until it works.

xonq avatar Dec 15 '20 16:12 xonq

Thanks, @xonq ! I fixed my gtf with gffread (which seems to do a pretty good job at standardizing gtf files by making them as simple as possible). I had downloaded the original files directly from RefSeq.

i revisited this issue because i ran into the same problem; turns out that orthofiller will pull out fasta headers with spaces... perhaps this is why they recommend cleaning your fasta at the start of the run. did you run the sed command mentioned in the description to clean your fasta?

xonq avatar Jan 18 '21 18:01 xonq