Autometa icon indicating copy to clipboard operation
Autometa copied to clipboard

Enable binning of eukaryotic genomes

Open jason-c-kwan opened this issue 4 years ago • 6 comments

Since we are not eukaryotic experts perhaps @hyphaltip would have some useful suggestions on how to implement this, but here are my thoughts:

  1. We would need to use a eukaryotic gene finder like Augustus. However, it probably wouldn't be incredibly accurate without RNA data (although I guess it could be an option to include that), and I don't know if anyone has ever tried it on metagenomes.
  2. There is a standard gene set that eukaryotic genomics efforts use to determine how complete their genome is. Sorry I don't have a citation to hand. However, while this would give us estimated completeness, I'm not sure whether it would give us purity because I'm not sure if they are all single copy.
  3. As outlined in the NSF proposal we are planning on checking the taxonomic congruence of single-copy markers we find in bacteria/archaea. So a similar method could be used to estimate purity of eukaryotic bins, perhaps?
  4. It might be a good idea to include contigs unclassified on the kingdom level in the analysis. I have long suspected that a lot of the eukaryotic portion ends up there because there are relatively fewer eukaryotic genomes in the NCBI database.

One advantage to doing this is that I think it would broaden the appeal of Autometa, it would be an interesting project for a student or outside contributor, and I have been asked about this several times at meetings.

If an outside contributor is interested in this - please let us know because it might be better to work together, and also base PRs off dev rather than main because it is pretty different right now (it is Python3 and most of the code has been refactored).

jason-c-kwan avatar Jun 18 '20 12:06 jason-c-kwan