mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

ChainReaders now apply their own transformations

Open mnmelo opened this issue 4 years ago • 13 comments

Closes #3343

Takes into account over-transformation of SingleFrameReaders, which never re-create their ts when re-accessed.

PR Checklist

  • [x] Tests?
  • [x] Docs?
  • [x] CHANGELOG updated?
  • [x] Issue raised/referenced?

mnmelo avatar May 31 '21 16:05 mnmelo

Hello @mnmelo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 64:59: E231 missing whitespace after ',' Line 64:62: E231 missing whitespace after ',' Line 170:40: E251 unexpected spaces around keyword / parameter equals Line 170:42: E251 unexpected spaces around keyword / parameter equals Line 174:40: E251 unexpected spaces around keyword / parameter equals Line 174:42: E251 unexpected spaces around keyword / parameter equals Line 199:36: E251 unexpected spaces around keyword / parameter equals Line 199:38: E251 unexpected spaces around keyword / parameter equals Line 203:36: E251 unexpected spaces around keyword / parameter equals Line 203:38: E251 unexpected spaces around keyword / parameter equals Line 206:36: E251 unexpected spaces around keyword / parameter equals Line 206:38: E251 unexpected spaces around keyword / parameter equals Line 218:36: E251 unexpected spaces around keyword / parameter equals Line 218:38: E251 unexpected spaces around keyword / parameter equals

Comment last updated at 2021-06-08 09:44:20 UTC

pep8speaks avatar May 31 '21 16:05 pep8speaks

To prevent over-transformation of SingleFrameReaders this now executes an active_reader.rewind() if the active_reader is a SingleFrameReader. It works, but might penalize performance because the reader will read the Timestep at least twice.

I'm exploring the possibility that the ChainReader stores an unmodified Timestep along with the sub-reader.

EDIT: Done! At least now only Timesteps are copied for every access, no need to re-parse single-frames.

mnmelo avatar May 31 '21 16:05 mnmelo

Codecov Report

Base: 93.52% // Head: 93.52% // Increases project coverage by +0.00% :tada:

Coverage data is based on head (c1a6da4) compared to base (f5efa42). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #3344   +/-   ##
========================================
  Coverage    93.52%   93.52%           
========================================
  Files          190      190           
  Lines        25028    25020    -8     
  Branches      3542     3543    +1     
========================================
- Hits         23407    23400    -7     
  Misses        1100     1100           
+ Partials       521      520    -1     
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/base.py 94.69% <ø> (ø)
package/MDAnalysis/core/_get_readers.py 92.10% <ø> (ø)
package/MDAnalysis/coordinates/chain.py 88.01% <100.00%> (+0.01%) :arrow_up:
MDAnalysisTests/coordinates/base.py 94.46% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar May 31 '21 16:05 codecov[bot]

How does that work with the memory reader? This was the main reason the ChainReader is currently differing the transformations for the sub-readers.

jbarnoud avatar Jun 04 '21 19:06 jbarnoud

Good point. Will check. Tests passed, though this may just mean we need more tests for these conditions 😉

mnmelo avatar Jun 05 '21 10:06 mnmelo

I dug deeper and, from what I could understand, @jbarnoud, there should be no conflict with the MemoryReader. Here's the several cases that I see may happen when loading a Universe with a ChainReader:

  1. in_memory requested at Universe creation time and transformations passed: in this case coordinates are first read into memory, and only then transformations added and applied. It will always be the MemoryReader (which has, by then, replaced the ChainReader) that will apply them, and this shouldn't differ from current behavior;
  2. transfer_to_memory issued after the Universe is loaded with transformations: in this case timesteps will be transformed by the ChainReader, then read into memory. This seems to also be the current behavior.
  3. add_transformations issued after the trajectory is in memory: this is essentially the same order of events as during Universe initialization.

Did you have any specific conflicting use-case in mind that I can double check? Perhaps there may be contrived corner cases where one of a ChainReader's sub-readers is a MemoryReader (is this even possible?). Was this what you were thinking about?

mnmelo avatar Jun 07 '21 01:06 mnmelo

The case that I expect to break is if you have a memory reader within the chain reader. That could happen if one of the trajectories in the chain is an array of coordinates. Then you have a special case with the chain reader and a special case for the memory reader, question is : how do these special case work with each other? On 7 Jun 2021 02:51, Manuel Nuno Melo @.***> wrote: I dug deeper and, from what I could understand, @jbarnoud, there should be no conflict with the MemoryReader. Here's the several cases that I see may happen when loading a Universe with a ChainReader: in_memory requested at Universe creation time and transformations passed: in this case coordinates are first read into memory, and only then transformations added and applied. It will always be the MemoryReader (which has, by then, replaced the ChainReader) that will apply them, and this shouldn't differ from current behavior;transfer_to_memory issued after the Universe is loaded with transformations: in this case timesteps will be transformed by the ChainReader, then read into memory. This seems to also be the current behavior.add_transformations issued after the trajectory is in memory: this is essentially the same order of events as during Universe initialization. Did you have any specific conflicting use-case in mind that I can double check? Perhaps there may be contrived corner cases where one of a ChainReader's sub-readers is a MemoryReader (is this even possible?). Was this what you were thinking about?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

jbarnoud avatar Jun 07 '21 07:06 jbarnoud

Ok, so maybe not so much a corner case.

In that situation we have indeed a problem, in that the ChainReader risks applying the same transform multiple times, cumulatively, on the MemoryReader. I'm trying to make the ChainReader aware of this and somehow keep track of the transformed sub-frames.

mnmelo avatar Jun 07 '21 19:06 mnmelo

The case of ChainReader with MemoryReader sub-readers now works well with transforms at the ChainReader level. There is a minor bug in ChainReader.__repr__ when using MemoryReader sub-readers (#3349). It's also fixed in this PR. (the fact that no-one had reported this bug indicates that ChainReaders with MemoryReader sub-readers are possibly an infrequent use-case)

mnmelo avatar Jun 08 '21 02:06 mnmelo

The tests are passing, but there are some problems connecting to Codecov from GH Actions CI (a consequence of the fastly crash?). I keep re-running the GH Actions CI tests but there's always one or two that can't complete that post-test step. Would a rerun of all CI help? (how can I even trigger it?)

mnmelo avatar Jun 09 '21 10:06 mnmelo

The tests are passing, but there are some problems connecting to Codecov from GH Actions CI (a consequence of the fastly crash?). I keep re-running the GH Actions CI tests but there's always one or two that can't complete that post-test step. Would a rerun of all CI help? (how can I even trigger it?)

@mnmelo I didn't get to see what happened. Maybe let's see what happens when this current round finishes running? There's a few options to re-run all the CI (the easiest is just to close and re-open the PR), but it's not necessary at the moment.

IAlibay avatar Jun 09 '21 10:06 IAlibay

Seemed to have worked this time. We can ignore the codecov project report (since it's just affected by the reduced number of lines).

IAlibay avatar Jun 09 '21 10:06 IAlibay

Given the slew of ChainReader-related issues, I'd like to breath some new life into this PR...

orbeckst avatar Feb 13 '23 20:02 orbeckst