mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

feature: IMDReader Integration

Open amruthesht opened this issue 1 year ago • 12 comments

Fixes #4827

This draft PR addresses the feature request discussed in #4827.

Note: The IMDReader feature which was previously a part of the imdclient package has been moved into MDAnalysis below. Any other modules have been in retained in imdclient, which has been added as an optional dependency here. We are currently in the process of splitting the imdclient package as mentioned above. (Issue, PR)

Major changes made in this Pull Request:

  • IMDReader, other associated base classes and a utility function were added to coordinates in the main package.
  • Appropriate test cases for the reader were added as a part of test-imd.py
  • Other corresponding changes were made to CI and environment *.yaml files
  • imdclient was added as an optional dependency

PR Checklist

  • [x] Issue raised/referenced?
  • [x] Tests updated/added?
  • [ ] Documentation updated/added?
  • [ ] package/CHANGELOG file updated?
  • [ ] Is your name in package/AUTHORS? (If it is not, add it!)

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--4923.org.readthedocs.build/en/4923/

amruthesht avatar Feb 19 '25 22:02 amruthesht

Codecov Report

:x: Patch coverage is 96.53179% with 6 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 93.88%. Comparing base (5d48c5c) to head (ea85bb5). :warning: Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
package/MDAnalysis/coordinates/IMD.py 93.02% 0 Missing and 6 partials :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4923      +/-   ##
===========================================
+ Coverage    93.86%   93.88%   +0.02%     
===========================================
  Files          179      180       +1     
  Lines        22249    22422     +173     
  Branches      3161     3186      +25     
===========================================
+ Hits         20885    21052     +167     
  Misses         902      902              
- Partials       462      468       +6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Feb 19 '25 22:02 codecov[bot]

Your thoughts on this are appreciated - @orbeckst

amruthesht avatar Feb 19 '25 23:02 amruthesht

I set the PR to Work in progress for the time being, just to indicate that we're not yet at the stage where the CI is working. Once the tests run properly, we can update the status.

Obviously, this shouldn't discourage anyone from contributing and commenting.

orbeckst avatar Feb 19 '25 23:02 orbeckst

@amruthesht do we have a plan for pushing ahead with suggested changes?

hmacdope avatar Mar 27 '25 09:03 hmacdope

@amruthesht do we have a plan for pushing ahead with suggested changes?

Working on the imdclient repository right now. I had a few small issues there with using the IMDReader, but getting them fixed soon. And then I'll wrap up some small changes on the mdanalysis end and we should be good to go!

amruthesht avatar Mar 27 '25 15:03 amruthesht

@amruthesht kicking CI here.

hmacdope avatar May 05 '25 15:05 hmacdope

@yuxuanzhuang would you be able to have a run through?

hmacdope avatar May 05 '25 15:05 hmacdope

I will have another look at the PR this week. One immediate thing is to add documentation.

Please refer to the comments below.

I read through and nothing jumped out at me that wasn't already mentioned, except that I didn't see these changes in the documentation that was linked. The following needs to be added:

Add doc/sphinx/source/documentation_pages/coordinates/IMD.rst .. automodule:: MDAnalysis.coordinates.IMD

doc/sphinx/source/documentation_pages/coordinate_modules.rst coordinates/IMD

doc/sphinx/source/documentation_pages/references.rst If you use IMD capability...

yuxuanzhuang avatar May 05 '25 16:05 yuxuanzhuang

Need to figure out tests also @amruthesht

hmacdope avatar May 05 '25 19:05 hmacdope

Also need to check if transformations work correctly.

hmacdope avatar Jun 07 '25 07:06 hmacdope

@amruthesht I have unblocked the CI so we can make some progress here (sorry for delay).

One complication, we are having some issues with the conda package for imdclient which as far as I can see is responsible for the circular import issues (unless someone else can see different). Would we be able to cut a new release of imdclient and remove mdanalysis as a recipe dependency? (https://github.com/Becksteinlab/imdclient/issues/65)

I assume this is okay as is only used for testing in the imdclient

I ran some tests locally and everything passed with a bleeding edge install of imdclient.

hmacdope avatar Jun 23 '25 11:06 hmacdope

On a separate note @yuxuanzhuang would you be able to chip in here with some help on either docs or test or other outstanding items?

hmacdope avatar Jun 23 '25 11:06 hmacdope

Fail here is due to a circular import which I think comes from https://github.com/Becksteinlab/imdclient/blob/main/imdclient/IMD.py#L11-L12.

Removal of MDA as a direct dep in https://github.com/Becksteinlab/imdclient/issues/65#issuecomment-3002195966 will likely fix this issue, but I also could be wrong about the source of circular import. Does that jive with your understanding @orbeckst?

hmacdope avatar Jun 26 '25 23:06 hmacdope

Yes, IMD.py seems to be the culprit.

This problem will be gone once we have completed the split in imdclient.

orbeckst avatar Jun 27 '25 01:06 orbeckst

The tests seem to be memory-consuming (which leads to the windows machine error?)

https://dev.azure.com/mdanalysis/mdanalysis/_build/results?buildId=8207&view=logs&j=86224841-1158-5011-fbb7-7c2644c2081b&t=99e15d79-615b-5891-6757-ff041fa8ce2e

##[warning]Free memory is lower than 5%; Currently used: 96.93%

Aside from that, this PR is in good shape. Maybe increase the coverage as well.

yuxuanzhuang avatar Jun 28 '25 23:06 yuxuanzhuang

The tests seem to be memory-consuming (which leads to the windows machine error?)

https://dev.azure.com/mdanalysis/mdanalysis/_build/results?buildId=8207&view=logs&j=86224841-1158-5011-fbb7-7c2644c2081b&t=99e15d79-615b-5891-6757-ff041fa8ce2e

##[warning]Free memory is lower than 5%; Currently used: 96.93%

Aside from that, this PR is in good shape. Maybe increase the coverage as well.

@amruthesht @ljwoods2 is high memory use a known thing with the IMDReader?

hmacdope avatar Jun 29 '25 00:06 hmacdope

@hmacdope It looks like these VMs have 7GB RAM. In theory, the IMDClient should only be using 10MB of mem during each test by default for numpy arrays in IMDFrames. Could be a deeper issue, but first we should probably answer

  • Is pytest running all of the different IMDReader tests in parallel? If that's the case, I can see all the memory getting wiped out. In this case we can use the buffer_size kwarg to reduce memory usage during tests, or un-parallelize these.
  • Is it certain the new tests are causing this?
  • Could this be garbage collection related? As in, each test is properly using 10MB and running sequentially, but things aren't being cleaned up fast enough/ at all?

If it's nothing else, it will require a deeper look at mem usage, I will try to reproduce if that's the case

ljwoods2 avatar Jun 29 '25 01:06 ljwoods2

  • Is pytest running all of the different IMDReader tests in parallel? If that's the case, I can see all the memory getting wiped out. In this case we can use the buffer_size kwarg to reduce memory usage during tests, or un-parallelize these.

From memory its run in parallel, I will look into forcing serial? Seems ok on ubuntu-latest, we could just control with an env var at worst.

hmacdope avatar Jun 29 '25 02:06 hmacdope

With tests available and passing, this PR can definitely come out of draft status.

orbeckst avatar Jul 01 '25 01:07 orbeckst

With tests available and passing, this PR can definitely come out of draft status.

Super exciting! In theory should be working, just final tweaks.

hmacdope avatar Jul 01 '25 02:07 hmacdope

@yuxuanzhuang would you be able to chip away at some of the docs tweaks and small things that need cleaning up?

hmacdope avatar Jul 07 '25 09:07 hmacdope

I’ve added tests to cover most of the newly introduced code. I had to separate the test for the IMD version into its own file, as it modifies sys.modules and interferes with other tests. If someone can find a way to make it work without causing side effects, that would be preferable.

One other point worth discussing is the division of attributes between StreamReaderBase and IMDReader. I moved the initialization of self._frame to StreamReaderBase because some functions assume this attribute exists in the base class. Not sure about whether n_atoms should also be moved.

yuxuanzhuang avatar Jul 13 '25 20:07 yuxuanzhuang

@orbeckst I have updated pins here for 0.2.2 and added the requisite minimum versions where appropriate. I think we might be almost good to go here.

hmacdope avatar Jul 21 '25 05:07 hmacdope

@hmacdope updated docs with buffer_size info and clarified difference between v2 and v3

ljwoods2 avatar Jul 24 '25 23:07 ljwoods2

I did another read through the docs and found a few things. We're clearly converging to the complete version of code + docs + tests. Please also check the linter feedback — it's an easy green light in the CI ;-).

@amruthesht please ping me when you have addressed my remaining comments and need a review again. Thanks!

orbeckst avatar Aug 06 '25 16:08 orbeckst

I did another read through the docs and found a few things. We're clearly converging to the complete version of code + docs + tests. Please also check the linter feedback — it's an easy green light in the CI ;-).

@amruthesht please ping me when you have addressed my remaining comments and need a review again. Thanks!

@orbeckst - Addressed most of your comments. Please let me know if any further changes are needed!

amruthesht avatar Aug 07 '25 19:08 amruthesht

@yuxuanzhuang @IAlibay could you please check if your comments have been addressed and if that's the case, approve the PR?

orbeckst avatar Aug 27 '25 20:08 orbeckst

@hmacdope @jaclark5 do you have any remaining questions or are you happy to move forward with merging the functionality?

(I assume that @jpkrowe would also be happy if he could use the develop branch directly in his GSOC streaming project).

orbeckst avatar Aug 27 '25 20:08 orbeckst

@amruthesht I think cd21d6bf4892bc8172f905816beffc1b6fff8369 has messed up the import guard somehow. I would consider reverting possibly.

hmacdope avatar Sep 08 '25 05:09 hmacdope

@hmacdope @yuxuanzhuang what needs to be done to get the PR merged from your end?

(I had a cursory look at @IAlibay 's https://github.com/MDAnalysis/mdanalysis/pull/4923#pullrequestreview-2631065350 and it seems to have been addressed but it's ultimately up to @IAlibay .)

orbeckst avatar Sep 08 '25 17:09 orbeckst