Implement async version EnsembleEvaluator
- A new asyncio.loop is set in base_run_model to run the entire EEAsync.
- EnsembleEvaluatorAsync runs when the scheduler feature is enabled
- This keeps the old synced EnsembleEvaluator untouched and is running with JobQueue
- Async evaluator does not use
BatchingDispatcher, although the events are still processed in batches
Issue Resolves https://github.com/equinor/ert/issues/6958 Resolves https://github.com/equinor/ert/issues/3124
~~Blocked by https://github.com/equinor/ert/pull/6811~~ Managed to run tests using synced monitor.py.
Approach Short description of the approach
(Screenshot of new behavior in GUI if applicable)
- [x] PR title captures the intent of the changes, and is fitting for release notes.
- [x] Added appropriate release note label
- [ ] Commit history is consistent and clean, in line with the contribution guidelines.
- [ ] Make sure tests pass locally (after every commit!)
When applicable
- [ ] When there are user facing changes: Updated documentation
- [ ] New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
- [ ] Large PR: Prepare changes in small commits for more convenient review
- [ ] Bug fix: Add regression test for the bug
- [ ] Bug fix: Create Backport PR to latest release
Codecov Report
Attention: Patch coverage is 99.33333% with 1 line in your changes missing coverage. Please review.
Project coverage is 89.84%. Comparing base (
a51790c) to head (bee3a97). Report is 1 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| src/ert/ensemble_evaluator/evaluator.py | 99.25% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #6994 +/- ##
=======================================
Coverage 89.84% 89.84%
=======================================
Files 347 346 -1
Lines 21138 21091 -47
=======================================
- Hits 18991 18950 -41
+ Misses 2147 2141 -6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Can you document in the PR or the issue why we should add an extra ensemble evaluator now, instead of converting the existing to async when the legacy JobQueue is purged?
Can you document in the PR or the issue why we should add an extra ensemble evaluator now, instead of converting the existing to async when the legacy JobQueue is purged?
This PR is still WIP and therefore there is the EnsembleEvaluatorAsync as a new class as it is much easier to find bugs (as of now). Once "confident" that it works I will rename EnsembleEvaluatorAsync -> EnsembleEvaluator and make corresponding changes - in different PR I guess.
Another thing is that Async version might have a slightly different API, which might require to update lots of tests - again should be part of a new PR mentioned above.
Really good use of typing @xjules!!💪
I have tested LSF cluster with EEAsync and it worked smoothly.
Nevertheless, when cancelling some realizations are shown as running
Need to find out if it is related to EEAsync closing server too fast.
Update: It seems that the same behavior is in the old job_queue and ee and therefore all is good.
~~TODO: fix /private/jparu/workspace/ert/kenv/root/lib64/python3.8/site-packages/ert/ensemble_evaluator/_wait_for_evaluator.py to use new async version.~~
Actually it is not relevant since this comes from Monitor which uses duplexer on the inside.
We might need more tests
Code coverage claims that 29 lines are missing coverage, could you check these and see if we need some more testing?