Inconsistent behavior between identical runs leading to simulation hanging
Describe the bug When running the fundamental integration example (see: https://github.com/GMLC-TDC/HELICS-Examples/pull/72) The simulation sometimes completes successfully and sometimes hangs without changing anything in the code or run command.
This seems to be linked to the time request made by the controller object. See the table in this comment
What is the expected behavior? The expected behavior would be for each run to produce the same result.
To Reproduce
- checkout branch
eranschweitzer/issue54from the HELICS-Examples Repository - navigate to folder
user_guide_examples/fundamental/fundamental_integration - run
helics run --path=fundamental_integration_runner.json - Change the call to request time in
destroy_federateby commenting/uncommenting this line: https://github.com/GMLC-TDC/HELICS-Examples/blob/9b24a2d8c73f572f31f1fd9baf2fcf75fdeb0ee1/user_guide_examples/fundamental/fundamental_integration/Controller.py#L40 - Change the controller's time request by selecting between these lines: https://github.com/GMLC-TDC/HELICS-Examples/blob/9b24a2d8c73f572f31f1fd9baf2fcf75fdeb0ee1/user_guide_examples/fundamental/fundamental_integration/Controller.py#L105-L107
Environment (please complete the following information):
- Operating System: Windows
- Language Extension: Python
- what compiler or setup process did you use: installed via
pip - HELICS version:
> helics --version
helics, version v3.2.1.post8
Python HELICS version v3.2.1.post8
HELICS Library version 3.2.1 (2022-06-16)
Additional context and information Take a look at the conversation in this pull request: https://github.com/GMLC-TDC/HELICS-Examples/pull/72
This seems to also possibly be tied to Issue #1510
@eranschweitzer can you try this on the new release when you get the chance. I made a fix for some cases that could be related, so need to check on the new release. If not I will keep at it.
I'll look into it next week and let you know
@phlptp
I'm still running into issues.
I ran the case where I'm requesting HELICS_MAX_TIME and there is a final request done in the destroy_federate function. (Third line of this table
The good news is that it did finish a couple of times. The bad news is that I ran into some errors.
Once, the controller errored out after the simulation was actually done, because there was a size in the arrays I was collecting for plotting. (The fact that there was one extra value suggests to me that there is something still going on with those last time request calls).
Another time, the simulation hung on the very last time request.
An interesting side note/observation: The issues appear to usually occur when I do something else while the simulation is running. I'm working in vscode using the integrated terminal, and I'll maybe open up the log files to see if they are being written to, or I'll go look at another piece of code while I'm waiting. Not really sure what to make of this, but thought it might be some useful info.
ok, I will keep poking at it, thanks
Is this still an on-going issue or has it been resolved?
I just read through the comments here and on the pull request where this came up and I see that the issue was unresolved. I have not done anything with it so my guess is that either: a. it is on-going, or b. it coincidentally was resolved by some unrelated update.