barrnap icon indicating copy to clipboard operation
barrnap copied to clipboard

Adding fungal HMM

Open danwiththeplan opened this issue 3 years ago • 4 comments

Hi Torsten. We met a few times during my time at La Trobe a few years back. I'm now at Plant and Food Research. We are interested in an open-source alternative to rnammer and would be prepared to do some work to extend barrnap so that it could be used for fungal genomes. Is barrnap still under development, and can we contribute? Much appreciated Dan Jones

danwiththeplan avatar Dec 14 '20 01:12 danwiththeplan

Dan it would be great to see some advancement here on the Fungi - we can collect the sets of examples and use the HMMs from the alignments we have but I am not sure if need to focus on clade-specific sets and iterate or if can be a single HMM sufficient. Would be happy to talk more about how we are approaching this - there are also some efforts from JGI To extract these regions through their pipelines so would be good to connect the dots.

hyphaltip avatar May 03 '21 21:05 hyphaltip

Hi, much appreciated for the response. Yes, I am still progressing this in collaboration with a colleague. I'd be keen to talk sometime about the specifics of what exactly you'd need to progress this. We have plenty of fungal genomes we can mine :)

danwiththeplan avatar May 03 '21 21:05 danwiththeplan

Hi, I'm Brogan McGreal a researcher at Plant and Food Research. Our team focuses on molecular plant pathogen interactions and I work with @danwiththeplan. We are currently looking to produce a fungal gene annotation pipeline using open source tools. We'd be happy to discuss what is required to get things working for fungi. I'm not sure what type of inputs you need and in what format i.e. do you need fungal genomes & aligned rna seq reads etc. As @danwiththeplan mentioned, we have plenty of fungal genomic and transcriptomic data.

broganmcgreal avatar May 11 '21 01:05 broganmcgreal

Hi if you are interested in fungal genome annotation I would first point to our project which has been successfully used for hundreds of genomes I've annotated and deposited and many others - https://github.com/nextgenusfs/funannotate/ - Jon has done an incredible job putting together the pipelines and several others are working on embedding it in snakemake or nextflow pipelines to enable more efficient annotation pipelines.

We also have generated and work on several thousand fungal genomes in our projects so it would be useful to explore what is helpful, my experience is the ribosomal loci are often missing from non-long read assemblies that do not account for the high coverage of the regions, so a combination of strategies are needed to actually get the ribosomal loci in an assembly depending on the ratio of their coverage to the nuclear genome and the parameters used.

I think mainly if the issue in this project is to better provide the HMMs to support the ribosomal locus and associated gene extractions it would make sense to match things up with what ITSx does to use the HMMs to find the boundaries and orient the ITS1 and ITS2 https://microbiology.se/software/itsx/.

hyphaltip avatar May 12 '21 05:05 hyphaltip