msmbuilder-legacy
msmbuilder-legacy copied to clipboard
Update tICA docs
So there's a file docs/tICA/tICA.pdf that shares a lot in common with http://msmbuilder.s3-website-us-east-1.amazonaws.com/theory/tICA.html
However, there are several key differences:
- HTML version is missing the actual MSMBuilder commands
I also noticed that the PDF version instructs users to download a special fork and branch, which I believe is no longer necessary, right?
OK I guess what happened was that Robert split the tICA doc into theory and application but never ported the application side of things.
I wonder if it makes more sense to merge them. Not sure.
Do you have a preferred way to generate the atom pairs? Otherwise, I vote that we include a tiny script inside the tutorial (probably at the very end, in a FAQ):
import itertools
import numpy as np
import mdtraj as md
trj = md.load("./system.subset.pdb")
top, bonds = trj.top.to_dataframe()
atom_indices = np.where((top.name == "CA") & (top.resSeq >= 139) & (top.resSeq <= 175))[0]
atom_pairs = list(itertools.combinations(atom_indices, 2))
np.savetxt("./AtomPairs.dat", atom_pairs, "%d")
Also, I wonder if it might make sense to set stride = 1 for the "beginner" TICA tutorial. Otherwise there's just a lot of parameters exposed to users.
I know you've found the 1 / 10 stride optimal, but I wonder if it make sense to ignore that fact for illustrative purposes.
Also: the "Drawbacks of TICA" section actually applies to all MSM-like forms of dimensionality reduction, so it might not be necessary.
thanks for checking this out, I'm going to make these changes (hopefully later today)
Also: IMHO eliminate the ProjectInfo.yaml inputs to scripts, as they get set by default. I think that will help people stay focused on the tICA-specific details.
If you do end up merging the theory + applications into a single sphinx file, could you move the link to sit under the "Documentation" section on the main docs page?
Again, let me know if you run out of time and I can file a PR for some of this stuff.
The reason I split them was because the original tICA tutorial latex file I was working from had all this info about downloading a different branch, and I wasn't sure which parts were or were not relevant currently. But the theory I knew was current.
I don't think having the theory and practice pages separated is a bad idea, especially if we can put in links between them.
So should I use k-centers or k-medoids when clustering my tICA results? Because we're working in the eigenvector space, I imagine that either one of the following is true:
- k-centers neglects equilibrium density because we're working in right eigenvector space
- k-medoids double-counts equilibrium density because we're working in left eigenvector space
In practice, I've found that the hybrid k-medoids that msmb implements doesn't change things drastically. If you wanted to do k-means, however, you could probably gain a lot.
I don't know the rigorously right way to do it, but for instance, when I used Ward clustering, I could build a 20 state model (just from clustering) that gave me the same model (with slightly faster timescales) as building a 1,000 state model with k-centers.
On Thu, Feb 6, 2014 at 10:59 AM, kyleabeauchamp [email protected]:
So should I use k-centers or k-medoids when clustering my tICA results? Because we're working in the eigenvector space, I imagine that either one of the following is true:
- k-centers neglects equilibrium density because we're working in right eigenvector space
- k-medoids double-counts equilibrium density because we're working in left eigenvector space
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/324#issuecomment-34357495 .
Thanks. Have you ever done k-means? E.g. do I have to update Cluster.py?
I think I answered my own question
Cluster.py tica atompairs: error: argument alg: invalid choice: 'kmeans' (choose from 'kcenters', 'hybrid', 'clarans', 'sclarans', 'hierarchical')
We don't have k-means in msmbuilder. I think I tried it once on my own with scikit-learn though
On Thu, Feb 6, 2014 at 11:12 AM, kyleabeauchamp [email protected]:
Thanks. Have you ever done k-means? E.g. do I have to update Cluster.py?
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/324#issuecomment-34358839 .
We do have a commented-out KMeans class in clustering.py...
Does it work? I didn't know that
On Thu, Feb 6, 2014 at 11:17 AM, kyleabeauchamp [email protected]:
We do have a commented-out KMeans class in clustering.py...
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/324#issuecomment-34359380 .
IMHO it looks highly suspicious.
If we want kmeans, we should definitely just wrap sklearn.
Hey our tICA pipeline is 100% streaming, which is a huge memory advantage over the previous RMSD-based pipeline. This is a huge win that we should advertise.
Yea, though it's not a streaming clusterer, but it can load / project things streaming so you can gain a lot
On Thu, Feb 6, 2014 at 12:44 PM, kyleabeauchamp [email protected]:
Hey our tICA pipeline is 100% streaming, which is a huge memory advantage over the previous RMSD-based pipeline. This is a huge win that we should advertise.
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/324#issuecomment-34368660 .
~~We probably also want an easy way / tutorial to calculate the tICA projections of each trajectory.~~
Edit: moved this request to a separate issue, as it's not docs related.
Also: need to cite your paper in the tutorial. Probably would also be nice to cite Frank's tICA paper as well.
OK looks like Robert has cited you in the HTML tICA theory guide, so that's already done.
Ok we also need to modify the commands to due to the changed tica load inteface.
Do you want to open a PR?
Maybe if there is a boring talk at BPS next week On Feb 12, 2014 9:57 PM, "Robert McGibbon" [email protected] wrote:
Do you want to open a PR?
Reply to this email directly or view it on GitHubhttps://github.com/SimTk/msmbuilder/issues/324#issuecomment-34943829 .