mdanalysis
mdanalysis copied to clipboard
ChainReaders now apply their own transformations
Closes #3343
Takes into account over-transformation of SingleFrameReaders, which
never re-create their ts when re-accessed.
PR Checklist
- [x] Tests?
- [x] Docs?
- [x] CHANGELOG updated?
- [x] Issue raised/referenced?
Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Line 64:59: E231 missing whitespace after ',' Line 64:62: E231 missing whitespace after ',' Line 170:40: E251 unexpected spaces around keyword / parameter equals Line 170:42: E251 unexpected spaces around keyword / parameter equals Line 174:40: E251 unexpected spaces around keyword / parameter equals Line 174:42: E251 unexpected spaces around keyword / parameter equals Line 199:36: E251 unexpected spaces around keyword / parameter equals Line 199:38: E251 unexpected spaces around keyword / parameter equals Line 203:36: E251 unexpected spaces around keyword / parameter equals Line 203:38: E251 unexpected spaces around keyword / parameter equals Line 206:36: E251 unexpected spaces around keyword / parameter equals Line 206:38: E251 unexpected spaces around keyword / parameter equals Line 218:36: E251 unexpected spaces around keyword / parameter equals Line 218:38: E251 unexpected spaces around keyword / parameter equals
Comment last updated at 2021-06-08 09:44:20 UTC
To prevent over-transformation of SingleFrameReaders this now executes an active_reader.rewind() if the active_reader is a SingleFrameReader. It works, but might penalize performance because the reader will read the Timestep at least twice.
I'm exploring the possibility that the ChainReader stores an unmodified Timestep along with the sub-reader.
EDIT: Done! At least now only Timesteps are copied for every access, no need to re-parse single-frames.
Codecov Report
Base: 93.52% // Head: 93.52% // Increases project coverage by +0.00% :tada:
Coverage data is based on head (
c1a6da4) compared to base (f5efa42). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## develop #3344 +/- ##
========================================
Coverage 93.52% 93.52%
========================================
Files 190 190
Lines 25028 25020 -8
Branches 3542 3543 +1
========================================
- Hits 23407 23400 -7
Misses 1100 1100
+ Partials 521 520 -1
| Impacted Files | Coverage Δ | |
|---|---|---|
| package/MDAnalysis/coordinates/base.py | 94.69% <ø> (ø) |
|
| package/MDAnalysis/core/_get_readers.py | 92.10% <ø> (ø) |
|
| package/MDAnalysis/coordinates/chain.py | 88.01% <100.00%> (+0.01%) |
:arrow_up: |
| MDAnalysisTests/coordinates/base.py | 94.46% <0.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
How does that work with the memory reader? This was the main reason the ChainReader is currently differing the transformations for the sub-readers.
Good point. Will check. Tests passed, though this may just mean we need more tests for these conditions 😉
I dug deeper and, from what I could understand, @jbarnoud, there should be no conflict with the MemoryReader. Here's the several cases that I see may happen when loading a Universe with a ChainReader:
in_memoryrequested at Universe creation time andtransformationspassed: in this case coordinates are first read into memory, and only then transformations added and applied. It will always be the MemoryReader (which has, by then, replaced the ChainReader) that will apply them, and this shouldn't differ from current behavior;transfer_to_memoryissued after the Universe is loaded with transformations: in this case timesteps will be transformed by the ChainReader, then read into memory. This seems to also be the current behavior.add_transformationsissued after the trajectory is in memory: this is essentially the same order of events as during Universe initialization.
Did you have any specific conflicting use-case in mind that I can double check? Perhaps there may be contrived corner cases where one of a ChainReader's sub-readers is a MemoryReader (is this even possible?). Was this what you were thinking about?
The case that I expect to break is if you have a memory reader within the chain reader. That could happen if one of the trajectories in the chain is an array of coordinates. Then you have a special case with the chain reader and a special case for the memory reader, question is : how do these special case work with each other? On 7 Jun 2021 02:51, Manuel Nuno Melo @.***> wrote: I dug deeper and, from what I could understand, @jbarnoud, there should be no conflict with the MemoryReader. Here's the several cases that I see may happen when loading a Universe with a ChainReader: in_memory requested at Universe creation time and transformations passed: in this case coordinates are first read into memory, and only then transformations added and applied. It will always be the MemoryReader (which has, by then, replaced the ChainReader) that will apply them, and this shouldn't differ from current behavior;transfer_to_memory issued after the Universe is loaded with transformations: in this case timesteps will be transformed by the ChainReader, then read into memory. This seems to also be the current behavior.add_transformations issued after the trajectory is in memory: this is essentially the same order of events as during Universe initialization. Did you have any specific conflicting use-case in mind that I can double check? Perhaps there may be contrived corner cases where one of a ChainReader's sub-readers is a MemoryReader (is this even possible?). Was this what you were thinking about?
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
Ok, so maybe not so much a corner case.
In that situation we have indeed a problem, in that the ChainReader risks applying the same transform multiple times, cumulatively, on the MemoryReader. I'm trying to make the ChainReader aware of this and somehow keep track of the transformed sub-frames.
The case of ChainReader with MemoryReader sub-readers now works well with transforms at the ChainReader level.
There is a minor bug in ChainReader.__repr__ when using MemoryReader sub-readers (#3349). It's also fixed in this PR. (the fact that no-one had reported this bug indicates that ChainReaders with MemoryReader sub-readers are possibly an infrequent use-case)
The tests are passing, but there are some problems connecting to Codecov from GH Actions CI (a consequence of the fastly crash?). I keep re-running the GH Actions CI tests but there's always one or two that can't complete that post-test step. Would a rerun of all CI help? (how can I even trigger it?)
The tests are passing, but there are some problems connecting to Codecov from GH Actions CI (a consequence of the fastly crash?). I keep re-running the GH Actions CI tests but there's always one or two that can't complete that post-test step. Would a rerun of all CI help? (how can I even trigger it?)
@mnmelo I didn't get to see what happened. Maybe let's see what happens when this current round finishes running? There's a few options to re-run all the CI (the easiest is just to close and re-open the PR), but it's not necessary at the moment.
Seemed to have worked this time. We can ignore the codecov project report (since it's just affected by the reduced number of lines).
Given the slew of ChainReader-related issues, I'd like to breath some new life into this PR...