scikit-bio-cookbook icon indicating copy to clipboard operation
scikit-bio-cookbook copied to clipboard

add recipe on using normalized mutual information for RNA secondary structure prediction

Open gregcaporaso opened this issue 10 years ago • 3 comments

This could take as input an alignment of functional RNA molecule sequences, and output a matrix of mutual information scores for all pairs of positions in the alignment. That matrix could be plotted as a heatmap, and "hot" diagonals would indicate regions of the sequence that may be base pairing. We can then use this recipe as location to point readers to more complex methods for doing this.

gregcaporaso avatar Sep 23 '14 23:09 gregcaporaso

:+1:

Would it be possible to store the matrix as either a DissimilarityMatrix or DistanceMatrix? Once https://github.com/biocore/scikit-bio/issues/684 is complete, we'll be able to easily create heatmaps with these classes.

jairideout avatar Sep 23 '14 23:09 jairideout

I think a DistanceMatrix would work for this, though semantically they're not distances (large values indicate correlation, not dissimilarity).

gregcaporaso avatar Sep 23 '14 23:09 gregcaporaso

Ah, good point. I guess we can figure out the best way to do this when the recipe is ready, since the distance matrix heatmap functionality may not be in an skbio release yet. Maybe just a simple plotting function would suffice to avoid confusion with differences in semantics.

jairideout avatar Sep 23 '14 23:09 jairideout