[WIP] Nexus: Enhanced simulations with error handling, branching and looping
Proposed changes
This PR introduces new capabilities in Nexus for simulation error handling, branching, and looping workflows. The implementation adds EnhancedSimulation and EnhancedProjectManager classes that extend the existing simulation framework with these advanced workflow features.
The new functionality is opt-in only and backward compatible: existing Nexus scripts continue to work unchanged. The enhanced capabilities are only activated when simulations are explicitly converted to EnhancedSimulation instances using the make_enhanced() wrapper function.
Examples in nexus/examples/quantum_espresso/ (directories 03-08) demonstrate the new features:
- 03_machine: Machine-specific execution and dependencies
- 04_loop: Loop-enabled simulations with iteration control
- 05_conditionals: Conditional execution utilities
- 06_branching: Branching workflows with
create_branch() - 07_merging: Merging strategies for parallel branches
- 08_error_handlers: Error handling with automatic retry and input modification
Note: The examples are designed for demonstration purposes. Error handlers may not be necessary for these simple test cases, but they illustrate the error handling capabilities available for more complex workflows.
This PR is shared for collaboration and discussion to gather feedback on the implementation approach and identify potential improvements.
What type(s) of changes does this code introduce?
- New feature
- Documentation or build script changes
Does this introduce a breaking change?
- No
What systems has this change been tested on?
Local development environment
Checklist
- [x] I have read the pull request guidance and develop docs
- [x] This PR is up to date with the current state of 'develop'
- [x] Code added or changed in the PR has been clang-formatted
- [ ] This PR adds tests to cover any new code, or to catch a bug that is being fixed
- [ ] Documentation has been added (if appropriate)
Good to see! Q. Can these simply replace the current classes? We learned from QMCPACK C++ that having 2 or more of anything is a bad idea due to maintenance costs and challenges for contributors. e.g. Driver.cpp, DriverEnhanced.cpp, DriverEnhancedNew2.cpp etc.
@prckent they inherit from the current classes so should be easy to replace them. However, I thought it is better to use the current kind of implementation so that testing would be easier for now without breaking anything.
Examples I provided work with —status-only and —generate-only options. I am open to ideas for trying a bunch of new test cases where we can better see how the error recovery will work.
Very interesting Kayahan. I will have to look this over closely. I had considered this type of functionality a long time ago but never had time to pursue it. Moving from DAG to DG would be a big step forward.