BioStructures.jl icon indicating copy to clipboard operation
BioStructures.jl copied to clipboard

Improve read perf, store coords as SVector

Open timholy opened this issue 8 months ago • 9 comments

This substantially reduces the amount of memory used to store coordinates and parse files. While parsing a large mmCIF file, the ~~memory usage dropped by approximately 50%, and read time by 15%.~~ number of allocations was cut by a factor of 3 and the read time dropped 40%.

The most potentially-disruptive change is that the coordinates are now stored as SVectors instead of Vectors. This means that the coordinates are now immutable, and you cannot change them in place by manual indexing. The x!, y!, and z! functions still work, as do in-place transformations, by making the coords field itself mutable.

Closes #55

timholy avatar May 08 '25 11:05 timholy

Inspired in part by seeing the activity in #45

timholy avatar May 08 '25 11:05 timholy

Codecov Report

Attention: Patch coverage is 97.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.34%. Comparing base (49dac0c) to head (be81bbf).

Files with missing lines Patch % Lines
src/mmcif.jl 96.87% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
+ Coverage   95.32%   95.34%   +0.01%     
==========================================
  Files          14       14              
  Lines        2033     2039       +6     
==========================================
+ Hits         1938     1944       +6     
  Misses         95       95              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 08 '25 11:05 codecov[bot]

Seems like a sensible change to me. I'm surprised how small the diff required to do this is.

Can we get away with this being a non-breaking change? The return type of coords has changed, but is still array-like.

If it has to be breaking, I would be tempted to change from Float64 to Float32 coordinates in the same release.

jgreener64 avatar May 08 '25 18:05 jgreener64

I agree this is a tricky issue. Fundamentally I'm not sure. Plain type-changes often aren't, but mutable to immutable is a behavior change.

If we decide it's breaking, then we might also evaluate whether there are any savings to be had by reordering some of the struct fields. Julia requires 8-byte alignment for certain field-types, so sizeof(T) can depend on the order of T's fields. That would definitely be a breaking change, since it affects constructors, and we'd only want to do that if the savings seemed worth the chaos.

timholy avatar May 08 '25 19:05 timholy

If we think it's non-breaking, should we get out a 4.5.0 release before merging it? If we go ahead with the notion that this is non-breaking, it feels like a release that would at least raise the risk that we might later decide to yank it.

timholy avatar May 08 '25 19:05 timholy

It's probably safer to consider it breaking, and think about reordering struct fields and moving to Float32 in the same release. I can look at the Float32 switch in the next few weeks unless someone else gets there first.

Either way I will release v4.5.0 shortly.

jgreener64 avatar May 10 '25 00:05 jgreener64

unless someone else gets there first.

Beat ya 🙂

Not pushed yet, but here's the current state:

julia> c
Chain DA with 401 residues, 0 other molecules, 2947 atoms

julia> Base.summarysize(c; exclude=Model) / 2947
98.67662029182219

That's less than half where we are now, https://github.com/BioJulia/BioStructures.jl/issues/59#issuecomment-2862833937

It's probably best done as a series of changes, though, just in case we regret some of them.

https://github.com/JuliaCollections/OrderedCollections.jl/pull/150 is preparatory work for these additional changes.

timholy avatar May 10 '25 12:05 timholy

Great!

v4.5.0 is at https://github.com/JuliaRegistries/General/pull/130705 :tada:

jgreener64 avatar May 11 '25 22:05 jgreener64

As the sequence of changes I have planned will ultimately generate merge conflicts for #71, let's wait and get that merged before getting started down this road.

timholy avatar May 14 '25 08:05 timholy