mdanalysis
mdanalysis copied to clipboard
No need to redirect to the original frame explicitly during deserialization
Fixes #3723 Partial fix: #3721
Changes made in this Pull Request:
- no redirection to the original frame (explicitly) during deserialization
- TextIOPicklable serializes the raw class instance instead of class name.
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2024-04-03 14:10:03 UTC
Codecov Report
Attention: Patch coverage is 95.00000%
with 1 lines
in your changes are missing coverage. Please review.
Project coverage is 93.64%. Comparing base (
73acc9b
) to head (1a6272e
).
Files | Patch % | Lines |
---|---|---|
package/MDAnalysis/lib/picklable_file_io.py | 94.44% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #3722 +/- ##
===========================================
- Coverage 93.66% 93.64% -0.02%
===========================================
Files 168 180 +12
Lines 21248 22332 +1084
Branches 3917 3918 +1
===========================================
+ Hits 19902 20913 +1011
- Misses 888 961 +73
Partials 458 458
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Once you need someone to review, feel free to ping/request reviewers.
I'd also suggest using the "convert to draft" in GitHub instead of just adding "WIP" to the issue title as it shows more intentionally if you currently want reviews or not.
This is probably an improvement, but I think it'll be a change in behaviour so we'll need to raise warnings etc.
The behaviour change is how the reading location TextIOPicklable
is recovered. Before it is done by universe.trajectory[i]
. Now it is directly serialized. Not sure if there's any explicitly warning that should be raised anywhere?
@richardjgowers Okay, I never thought about preserving the modification of coordinates (before).
But the behavior does change after this PR, i.e. now the modification is preserved (because no data reloading from the file is performed after this line is removed) which is more "correct".
u = mda.Universe(XYZ)
print('Orig pos:,', u.trajectory.ts.positions[0])
u.trajectory.ts.positions[0] = np.array([1, 2, 3])
print('New pos:,', u.trajectory.ts.positions[0])
u_new = pickle.loads(pickle.dumps(u))
print('After serializing pos:,', u_new.trajectory.ts.positions[0])
# original output
Orig pos:, [ 0.931 17.318 16.423]
New pos:, [1. 2. 3.]
After serializing pos:, [ 0.931 17.318 16.423]
# new output after #PR 3722
Orig pos:, [ 0.931 17.318 16.423]
New pos:, [1. 2. 3.]
After serializing pos:, [1. 2. 3.]
@yuxuanzhuang thanks for adding the test - it looks like it found that h5md format has a custom serialisation mechanism
Should work now! (too many formats to keep track of :) )
@yuxuanzhuang could you resolve the conflicts please?
@yuxuanzhuang and @richardjgowers this PR still looks relevant. Why did it stall? What needs to be done to complete it?
Linter Bot Results:
Hi @yuxuanzhuang! Thanks for making this PR. We linted your code and found the following:
Some issues were found with the formatting of your code.
Code Location | Outcome |
---|---|
main package | ⚠️ Possible failure |
testsuite | ⚠️ Possible failure |
Please have a look at the darker-main-code
and darker-test-code
steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/8540187312/job/23396655061
Please note: The black
linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!
@yuxuanzhuang could you resolve the conflicts please?
Done! Should I worry about the linter bot?
@yuxuanzhuang could you resolve the conflicts please?
Done! Should I worry about the linter bot?
Unless it outlines a flake8 failure it's ok to ignore.
Just checked - no flake8 so you're good to go.
I understand why TNGReader needs to re-read the current frame during serialization; however, due to this, the modification of ts
cannot be preserved in the TNG format. If I comment it out, all the tests will pass.
# make sure we re-read the current frame to update C level objects in
# the file iterator
self._read_frame(self._frame)
Do you think it's reasonable to only make sure _file_iterator
reads the step but no change of ts
? @hmacdope
# convert from frames to integrator steps
step = self._frame_to_step(self._frame + 1)
iterator_step = self._file_iterator.read_step(step)
I understand why TNGReader needs to re-read the current frame during serialization; however, due to this, the modification of
ts
cannot be preserved in the TNG format. If I comment it out, all the tests will pass.# make sure we re-read the current frame to update C level objects in # the file iterator self._read_frame(self._frame)
Do you think it's reasonable to only make sure
_file_iterator
reads the step but no change ofts
? @hmacdope# convert from frames to integrator steps step = self._frame_to_step(self._frame + 1) iterator_step = self._file_iterator.read_step(step)
Just checking my understanding @yuxuanzhuang, this means that the coordinate data will be re-read, but that any additional data in ts
will become invalid? I am just not sure exactly what is meant by change of ts
. Perhaps a super quick example that demos the difference?
Sorry about this mess, tbh this is being caused by my lazyness from way back where I didn't make the cython level classes pickleable in pytng
(its a bit complicated for __cinnit__
using classes). See https://github.com/MDAnalysis/pytng/issues/97. If I am understanding correctly it would be better to just be able to pickle everything there, rather than forcing a re-read of the step? I can look at addressing this but may take a while as super busy. Perhaps better to go for an interim solution that you suggested above?
@hmacdope and @richardjgowers can you please check if your raised concerns have been addressed?
If yes, can we merge?
Otherwise, are there bigger issues that we cannot solve?
I am just not sure exactly what is meant by change of ts. Perhaps a super quick example that demos the difference?
I mean modification of e.g. coordinates.
u = mda.Universe(TNG_traj_gro, TNG_traj)
reader = u.trajectory
reader.ts.positions[0] = np.array([1, 2, 3])
reader_p = pickle.loads(pickle.dumps(reader))
assert_equal(reader.ts, reader_p.ts,
"Modification of ts not preserved after serialization")
Previously, during unpickling, self._read_frame(self._frame)
will 1) update self._file_iterator
to the current frame. 2) modify self.ts
by reading from self._file_iterator
. This will discard any change to the timestep itself.
Now, only self._file_iterator
will be reopened and updated. self.ts
will be preserved during serialization.
@richardjgowers is your review still pertinent or should I just dismiss it?