bgcflow icon indicating copy to clipboard operation
bgcflow copied to clipboard

Prune BGCFlow rules

Open matinnuhamunada opened this issue 2 years ago • 1 comments

I think we should reduce the number of rules, either by:

  • Dropping unused or redundant rules
  • Merging relevant rules into one package

These are the rules that are currently available:

Rules Notebook / Markdown Report Drop / Merge Comments
mlst
eggnog
refseq-masher
mash
fastani
automlst-wrapper
roary
eggnog-roary
seqfu
rnammer
bigslice
query-bigslice
checkm
gtdbtk
prokka-gbk
antismash
arts
diamond
diamond-roary
deeptfactor
deeptfactor-roary
cblaster-genome
cblaster-bgc
patric_meta
bigscape

matinnuhamunada avatar Oct 10 '22 06:10 matinnuhamunada

Please find my comments below

Rules Notebook / Markdown Report Drop / Merge Comments
mlst Not needed Drop Only useful for pathogenic bacteria and slowly we are moving towards genome based classification anyway
eggnog Useful for some smaller projects Keep for some projects Low priority as eggnog-roary is more useful. Table with genome_id as index and COG categories as columns with the number of genes in COG per genome as values. The visuals for each genome, when clicked on the table index, can include bar chart COG category counts spread across the genome (use buckets of 100 genes) PFA the visual no 1
refseq-masher Not needed Drop Stopped using since the DB is quite old
mash Very useful Keep it as pimary Hierarchical clustering image and MASH distance heatmap as visuals. Bar chart with number of genomes per Mash cluster. Table with mash cluster assignment for each genome.
fastani Useful Optional secondary Use MASH distance as primary report as its faster. Hierarchical clustering image and FastANI distance heatmap as visuals. Bar chart with number of genomes per FastANI cluster. Table with FastANI cluster assignment for each genome.
automlst-wrapper Useful Keep Downloadable tree file in newick. Possibly include extra tables that can be used in iTOL with documentation steps.
roary Very useful Keep Visuals can include pangenome curve, gene presence heatmap, table with gene presence binary, table with gene presence locus_tags, table with list of genes in the pangenome, Link to Eggnog-roary report
eggnog-roary Very useful Keep Visuals with pangenome category bar chart per COG category, table with eggnog annotations for each gene in pangenome (expansion of roary pagenome annotation table), Link to roary report
seqfu Very useful Keep Existing visuals and tables are great
rnammer Drop
bigslice Keep
query-bigslice Keep
checkm Useful
gtdbtk Keep
prokka-gbk Make default
antismash Keep
arts Dev?
diamond Drop Download database and give example how to use
diamond-roary Drop Low priority Visuals can include bar chart of TFs predicted in each category of pangenome. Need to brainstorm for visuals.
deeptfactor Drop Low priority Visuals can include histogram of TFs predicted per genome. Need to brainstorm for visuals.
deeptfactor-roary Drop Low priority Visuals can include bar chart of TFs predicted in each category of pangenome. Need to brainstorm for visuals.
cblaster-genome Keep Low priority Can include link to download blastdb
cblaster-bgc Drop l Low priority Can include link to download blastdb
patric_meta Drop Drop Maybe use NCBI metadata instead, I had hard time to link every genome to PATRIC correctly
bigscape Very important Keep Provide more tables that can be loaded to Cytoscape. Add table with genome metadata on number of Known BGCs, unique BGCs, etc., Add GCF presence heatmap or table.

Example visual:

  1. Eggnog notebook visual per genome page newplot (96)

OmkarSaMo avatar Oct 11 '22 09:10 OmkarSaMo