genome-grist icon indicating copy to clipboard operation
genome-grist copied to clipboard

[WIP] add rules for protein mapping

Open bluegenes opened this issue 2 years ago • 1 comments

This PR introduces rules to allow mapping nucleotide reads to protein references using Paladin.

not functional yet.

Main questions at this point:

  • Do we want to just do this when the user selects protein sourmash? Or do we want to enabling running both protein and nucleotide sourmash within the same grist output folder?
    • mostly what I'm getting at here is whether or not we want to include the moltype in the gather output filename, because we expect folks might want to run both moltypes. I know I want to run both, but I'm not sure if this is a general use case.
  • Do we want to store proteomes in the same folder as genbank genomes? Or in a separate folder, e.g. proteomes?

To do:

  • [ ] make checkpoints --> download proteomes work
  • [ ] new checkpoint to prodigal proteome if not downloadable
  • [ ] try BBMerge, fall back to PEAR read merging if don't like
  • [ ] add tests
  • [ ] Add reporting and visualization

bluegenes avatar Feb 10 '22 21:02 bluegenes