Improve read perf, store coords as SVector
This substantially reduces the amount of memory used to store coordinates and parse files. While parsing a large mmCIF file, the ~~memory usage dropped by approximately 50%, and read time by 15%.~~ number of allocations was cut by a factor of 3 and the read time dropped 40%.
The most potentially-disruptive change is that the coordinates are now
stored as SVectors instead of Vectors. This means that the coordinates
are now immutable, and you cannot change them in place by manual
indexing. The x!, y!, and z! functions still work, as do in-place
transformations, by making the coords field itself mutable.
Closes #55
Inspired in part by seeing the activity in #45
Codecov Report
Attention: Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.
Project coverage is 95.34%. Comparing base (
49dac0c) to head (be81bbf).
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/mmcif.jl | 96.87% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #70 +/- ##
==========================================
+ Coverage 95.32% 95.34% +0.01%
==========================================
Files 14 14
Lines 2033 2039 +6
==========================================
+ Hits 1938 1944 +6
Misses 95 95
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Seems like a sensible change to me. I'm surprised how small the diff required to do this is.
Can we get away with this being a non-breaking change? The return type of coords has changed, but is still array-like.
If it has to be breaking, I would be tempted to change from Float64 to Float32 coordinates in the same release.
I agree this is a tricky issue. Fundamentally I'm not sure. Plain type-changes often aren't, but mutable to immutable is a behavior change.
If we decide it's breaking, then we might also evaluate whether there are any savings to be had by reordering some of the struct fields. Julia requires 8-byte alignment for certain field-types, so sizeof(T) can depend on the order of T's fields. That would definitely be a breaking change, since it affects constructors, and we'd only want to do that if the savings seemed worth the chaos.
If we think it's non-breaking, should we get out a 4.5.0 release before merging it? If we go ahead with the notion that this is non-breaking, it feels like a release that would at least raise the risk that we might later decide to yank it.
It's probably safer to consider it breaking, and think about reordering struct fields and moving to Float32 in the same release. I can look at the Float32 switch in the next few weeks unless someone else gets there first.
Either way I will release v4.5.0 shortly.
unless someone else gets there first.
Beat ya 🙂
Not pushed yet, but here's the current state:
julia> c
Chain DA with 401 residues, 0 other molecules, 2947 atoms
julia> Base.summarysize(c; exclude=Model) / 2947
98.67662029182219
That's less than half where we are now, https://github.com/BioJulia/BioStructures.jl/issues/59#issuecomment-2862833937
It's probably best done as a series of changes, though, just in case we regret some of them.
https://github.com/JuliaCollections/OrderedCollections.jl/pull/150 is preparatory work for these additional changes.
Great!
v4.5.0 is at https://github.com/JuliaRegistries/General/pull/130705 :tada:
As the sequence of changes I have planned will ultimately generate merge conflicts for #71, let's wait and get that merged before getting started down this road.