bgcflow
bgcflow copied to clipboard
Prune BGCFlow rules
I think we should reduce the number of rules, either by:
- Dropping unused or redundant rules
- Merging relevant rules into one package
These are the rules that are currently available:
Rules | Notebook / Markdown Report | Drop / Merge | Comments |
---|---|---|---|
mlst | |||
eggnog | |||
refseq-masher | |||
mash | |||
fastani | |||
automlst-wrapper | |||
roary | |||
eggnog-roary | |||
seqfu | |||
rnammer | |||
bigslice | |||
query-bigslice | |||
checkm | |||
gtdbtk | |||
prokka-gbk | |||
antismash | |||
arts | |||
diamond | |||
diamond-roary | |||
deeptfactor | |||
deeptfactor-roary | |||
cblaster-genome | |||
cblaster-bgc | |||
patric_meta | |||
bigscape |
Please find my comments below
Rules | Notebook / Markdown Report | Drop / Merge | Comments |
---|---|---|---|
mlst | Not needed | Drop | Only useful for pathogenic bacteria and slowly we are moving towards genome based classification anyway |
eggnog | Useful for some smaller projects | Keep for some projects | Low priority as eggnog-roary is more useful. Table with genome_id as index and COG categories as columns with the number of genes in COG per genome as values. The visuals for each genome, when clicked on the table index, can include bar chart COG category counts spread across the genome (use buckets of 100 genes) PFA the visual no 1 |
refseq-masher | Not needed | Drop | Stopped using since the DB is quite old |
mash | Very useful | Keep it as pimary | Hierarchical clustering image and MASH distance heatmap as visuals. Bar chart with number of genomes per Mash cluster. Table with mash cluster assignment for each genome. |
fastani | Useful | Optional secondary | Use MASH distance as primary report as its faster. Hierarchical clustering image and FastANI distance heatmap as visuals. Bar chart with number of genomes per FastANI cluster. Table with FastANI cluster assignment for each genome. |
automlst-wrapper | Useful | Keep | Downloadable tree file in newick. Possibly include extra tables that can be used in iTOL with documentation steps. |
roary | Very useful | Keep | Visuals can include pangenome curve, gene presence heatmap, table with gene presence binary, table with gene presence locus_tags, table with list of genes in the pangenome, Link to Eggnog-roary report |
eggnog-roary | Very useful | Keep | Visuals with pangenome category bar chart per COG category, table with eggnog annotations for each gene in pangenome (expansion of roary pagenome annotation table), Link to roary report |
seqfu | Very useful | Keep | Existing visuals and tables are great |
rnammer | Drop | ||
bigslice | Keep | ||
query-bigslice | Keep | ||
checkm | Useful | ||
gtdbtk | Keep | ||
prokka-gbk | Make default | ||
antismash | Keep | ||
arts | Dev? | ||
diamond | Drop | Download database and give example how to use | |
diamond-roary | Drop | Low priority | Visuals can include bar chart of TFs predicted in each category of pangenome. Need to brainstorm for visuals. |
deeptfactor | Drop | Low priority | Visuals can include histogram of TFs predicted per genome. Need to brainstorm for visuals. |
deeptfactor-roary | Drop | Low priority | Visuals can include bar chart of TFs predicted in each category of pangenome. Need to brainstorm for visuals. |
cblaster-genome | Keep | Low priority | Can include link to download blastdb |
cblaster-bgc | Drop l | Low priority | Can include link to download blastdb |
patric_meta | Drop | Drop | Maybe use NCBI metadata instead, I had hard time to link every genome to PATRIC correctly |
bigscape | Very important | Keep | Provide more tables that can be loaded to Cytoscape. Add table with genome metadata on number of Known BGCs, unique BGCs, etc., Add GCF presence heatmap or table. |
Example visual:
- Eggnog notebook visual per genome page