ert icon indicating copy to clipboard operation
ert copied to clipboard

Implement async version EnsembleEvaluator

Open xjules opened this issue 1 year ago • 8 comments

  • A new asyncio.loop is set in base_run_model to run the entire EEAsync.
  • EnsembleEvaluatorAsync runs when the scheduler feature is enabled
  • This keeps the old synced EnsembleEvaluator untouched and is running with JobQueue
  • Async evaluator does not use BatchingDispatcher, although the events are still processed in batches

Issue Resolves https://github.com/equinor/ert/issues/6958 Resolves https://github.com/equinor/ert/issues/3124

~~Blocked by https://github.com/equinor/ert/pull/6811~~ Managed to run tests using synced monitor.py.

Approach Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • [x] PR title captures the intent of the changes, and is fitting for release notes.
  • [x] Added appropriate release note label
  • [ ] Commit history is consistent and clean, in line with the contribution guidelines.
  • [ ] Make sure tests pass locally (after every commit!)

When applicable

  • [ ] When there are user facing changes: Updated documentation
  • [ ] New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • [ ] Large PR: Prepare changes in small commits for more convenient review
  • [ ] Bug fix: Add regression test for the bug
  • [ ] Bug fix: Create Backport PR to latest release

xjules avatar Jan 23 '24 07:01 xjules

Codecov Report

Attention: Patch coverage is 99.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 89.84%. Comparing base (a51790c) to head (bee3a97). Report is 1 commits behind head on main.

Files Patch % Lines
src/ert/ensemble_evaluator/evaluator.py 99.25% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #6994   +/-   ##
=======================================
  Coverage   89.84%   89.84%           
=======================================
  Files         347      346    -1     
  Lines       21138    21091   -47     
=======================================
- Hits        18991    18950   -41     
+ Misses       2147     2141    -6     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jan 31 '24 08:01 codecov-commenter

Can you document in the PR or the issue why we should add an extra ensemble evaluator now, instead of converting the existing to async when the legacy JobQueue is purged?

berland avatar Mar 25 '24 20:03 berland

Can you document in the PR or the issue why we should add an extra ensemble evaluator now, instead of converting the existing to async when the legacy JobQueue is purged?

This PR is still WIP and therefore there is the EnsembleEvaluatorAsync as a new class as it is much easier to find bugs (as of now). Once "confident" that it works I will rename EnsembleEvaluatorAsync -> EnsembleEvaluator and make corresponding changes - in different PR I guess.

Another thing is that Async version might have a slightly different API, which might require to update lots of tests - again should be part of a new PR mentioned above.

xjules avatar Mar 26 '24 09:03 xjules

Really good use of typing @xjules!!💪

jonathan-eq avatar Apr 04 '24 13:04 jonathan-eq

I have tested LSF cluster with EEAsync and it worked smoothly. Nevertheless, when cancelling some realizations are shown as running image

Need to find out if it is related to EEAsync closing server too fast.


Update: It seems that the same behavior is in the old job_queue and ee and therefore all is good.

xjules avatar Apr 17 '24 13:04 xjules

~~TODO: fix /private/jparu/workspace/ert/kenv/root/lib64/python3.8/site-packages/ert/ensemble_evaluator/_wait_for_evaluator.py to use new async version.~~ Actually it is not relevant since this comes from Monitor which uses duplexer on the inside.

xjules avatar Apr 18 '24 10:04 xjules

We might need more tests

jonathan-eq avatar May 15 '24 06:05 jonathan-eq

Code coverage claims that 29 lines are missing coverage, could you check these and see if we need some more testing?

berland avatar May 31 '24 12:05 berland