emg-viral-pipeline icon indicating copy to clipboard operation
emg-viral-pipeline copied to clipboard

Taxonomic ranks are inverted

Open Ales-ibt opened this issue 1 year ago • 11 comments

Hello there!

I've been testing the VIRify v2.0 and I realised that the taxonomic annotation on the GFF file has the ranks inverted.

For instance: taxonomy=Entomopoxvirinae;Poxviridae;Chitovirales

Should be: taxonomy=Chitovirales;Poxviridae;Entomopoxvirinae

And ideally, it would be great to have the whole lineage like: taxonomy=Viruses;Bamfordvirae;Nucleocytoviricota;Pokkesviricetes;Chitovirales;Poxviridae;Entomopoxvirinae

There are also some problems with names like Caudovirales which is shown in the NCBI taxonomy database as Caudoviricetes.

Thanks in advance!

Ales.

Ales-ibt avatar Sep 07 '23 14:09 Ales-ibt

Hey, thx @Ales-ibt !

Yes agree, inverting the ranks would make more sense probably. Having the full ranks shown should be also possible with the NCBI taxonomy file @guille0387 , or?

Regarding the Caudovirales vs Caudoviricetes: actually Caudovirales should not be in the pipeline anymore bc the taxa was discontinued by ICTV. We added the following warning mssg when running VIRify:

Warning: --meta_version v4 does not include the following discontinued virus taxa 
(according to ICTV) anymore and they have been excluded from the dataset.
- Allolevivirus
- Autographivirinae
- Buttersvirus
- Caudovirales
- Chungbukvirus
- Incheonvirus
- Leviviridae
- Levivirus
- Mandarivirus
- Pbi1virus
- Phicbkvirus
- Radnorvirus
- Sitaravirus
- Vidavervirus
- Myoviridae
- Siphoviridae
- Podoviridae
- Viunavirus
- Orthohepevirus
- Klosneuvirus
- Hendrixvirus
- Rubulavirus
- Avulavirus
- Catovirus
- Nucleorhabdovirus
- Viunavirus
- Gammalipothrixvirus
- Peduovirinae
- Sedoreovirinae

Did you still had Caudovirales in your results? Can you try a fresh installation and most importantly re-download of the database files? Maybe an old database file was still used.

hoelzer avatar Sep 15 '23 13:09 hoelzer

Hi @hoelzer @Ales-ibt

Yes, I think it should be possible to invert the order of the ranks and include the complete lineage.... let me have a look into this and I'll get back to you asap.

guille0387 avatar Sep 18 '23 12:09 guille0387

Hi @hoelzer @Ales-ibt

I created a new branch called out_lineage with modifications in the contig taxonomic assignment script. The output should now reflect the suggestions that Ales made. I tested it with the two mock datasets we used in the paper and it worked, but perhaps Ales would like to try it with her own data? Let me know if you have any issues.

guille0387 avatar Sep 19 '23 13:09 guille0387

Great, thx @guille0387 ! Looks also good for me. @Ales-ibt can you give it a try as well? thx!

hoelzer avatar Sep 20 '23 09:09 hoelzer

Great, I'll run a test and be back to you soon.

Ales-ibt avatar Sep 21 '23 09:09 Ales-ibt

Hello, sorry about taking that long to be back. I updated the NCBI database and now I have the correct Caudoviricetes annotation :D. I also tested the pipeline on the out_lineage branch and I can see the complete lineages beautifully sorted on the 08-final/taxonomy/*prodigal_annotation_taxonomy.tsv, thank you so much for this. The only detail is that this fix is not reflected on the GFF output file.

Thank you again!

Ales

Ales-ibt avatar Oct 04 '23 11:10 Ales-ibt

Awesome, thanks for checking, @Ales-ibt !

@guille0387 can you also do the GFF fix and then we could merge that into dev @mberacochea

hoelzer avatar Oct 07 '23 10:10 hoelzer

Excellent @guille0387!, thank you for that fix. Let me know if you need a hand fixing the GFF.

mberacochea avatar Oct 10 '23 09:10 mberacochea

Hi Martin!

Actually, I might need help with the GFF 😅… I’m not even sure which step of the pipeline generates that file as output… if you could help me out with that it’d be great, or if you could guide me on what to do that’d be great too :)

Guillermo Rangel-Pineros Postdoc Palaeoproteomics Group

University of Copenhagen Faculty of Health and Medical Sciences The Globe Institute Øster Farimagsgade 5, bygning 7 1353 Copenhagen K DENMARK

MOB +45 50 10 57 42 @.@.>

On 10 Oct 2023, at 11.24, Martín Beracochea @.***> wrote:

Excellent @guille0387https://github.com/guille0387!, thank you for that fix. Let me know if you need a hand fixing the GFF.

— Reply to this email directly, view it on GitHubhttps://github.com/EBI-Metagenomics/emg-viral-pipeline/issues/113#issuecomment-1754803094, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGFWEOPYV2EHWQBPHWTD6VLX6UH43AVCNFSM6AAAAAA4O7ZXJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJUHAYDGMBZGQ. You are receiving this because you were mentioned.Message ID: @.***>

guille0387 avatar Oct 20 '23 11:10 guille0387

Hey folks,

I'm trying to catch up with the virify backlog, there is an excellent PR #84 to add support for Virsorter2 so it's perfect oporunity to make a new release including also this fix.

Cheers

mberacochea avatar Jul 20 '24 11:07 mberacochea

Hey, yes agree that would be perfect to have another release with VS2 support and some of the current open issues resolved.

I think here everything was solved

I created a new branch called out_lineage with modifications in the contig taxonomic assignment script.

just not the change of taxonomic rank orders in the GFF... Ah, or this was done in #129 @mberacochea ? Then this issue should be solved

hoelzer avatar Jul 30 '24 14:07 hoelzer