scikit-bio-cookbook
scikit-bio-cookbook copied to clipboard
recipe for visualizing within vs between distances
@gregcaporaso and I were chatting (offline and in https://github.com/biocore/scikit-bio/issues/764) about adding a recipe showing how to visualize "within" vs "between" distances using scikit-bio (DistanceMatrix
), pandas, and seaborn's boxplots. This recipe would basically show how to reproduce QIIME's make_distance_boxplots.py script.
@gregcaporaso suggested using the existing 88 Soils dataset that's already included with the cookbook to discretize pH and plot within/between distance boxplots.
This recipe would also be handy because it'll show how to use seaborn's boxplots with scikit-bio data so that we can deprecate skbio.draw.boxplots
(https://github.com/biocore/scikit-bio/issues/764). Finally, it may inspire future additions to the DistanceMatrix
API for extracting within/between distances.
@jairideout, I was working on doing just this the other day. It makes use of pandas, seaborn and DistanceMatrix
from skbio. The code is very messy as I was just using it for visualizations and data munging. I apologize I don't have time at the moment to write up a recipe, though anyone should feel free to use the code I wrote. There is at least an example of what the within and between boxplots look like a little ways down the page
Here is the notebook
Awesome, thanks @johnchase!