Kyle Beauchamp
Kyle Beauchamp
To me, this should be flagged low priority given our huge backlog of work and that the 1-letter notation is protein-specific and will not work for residue modifications.
FWIW, I've uploaded a gist (https://gist.github.com/kyleabeauchamp/a926c4bde9460536b0fb) containing some driver code for the residue selection step in MDTraj.
The problem is likely that you also need to include the conda-forge channel `-c conda-forge` to pick up the R packages. However, I also noticed a couple of missing packages...
FWIW, the pysam folks have made a lot of improvements on their VCF parser, which may be helpful if you do decide to refactor.
So is there some "theory" of optimizing noisy that we can defer to? Regarding MBar, how many snapshots are we talking about? It's very possible that we could optimize MBar...
This may take me a couple of days to process, as I'm not super familiar with the code. IMHO Robert might have useful opinions here, as he's thought about "exploration"...
So if you think MBar is really rate limiting, then I definitely can revisit my work on improving its speed.
It's on my "eventual" to do list. When that becomes a high-priority, harass me on the MBar GitHub issue tracker...
IMHO tab or whitespace delimited files are a good compromise of easy parsing and easy reading. CSV files are very foolproof against formatting errors, but are a bit harder to...