raven icon indicating copy to clipboard operation
raven copied to clipboard

Updating to ray2

Open joshua-cogliati-inl opened this issue 2 years ago • 13 comments


Pull Request Description

What issue does this change request address?

#1972

What are the significant changes in functionality due to this change request?

Updates ray to version 2 Always pass PYTHONPATH to ray init.


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • [ ] 1. Review all computer code.
  • [ ] 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • [ ] 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • [ ] 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • [ ] 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in <RunInfo> XML block, the node <internalParallel> to True.
  • [ ] 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • [ ] 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • [ ] 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • [ ] 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

joshua-cogliati-inl avatar Sep 29 '22 15:09 joshua-cogliati-inl

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

moosebuild avatar Sep 30 '22 15:09 moosebuild

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

moosebuild avatar Sep 30 '22 16:09 moosebuild

Job Precheck on 743a8f0 : invalidated by @milljm

moosebuild avatar Sep 30 '22 16:09 moosebuild

Job Precheck on 743a8f0 : invalidated by @joshua-cogliati-inl

failed in fetch

moosebuild avatar Sep 30 '22 17:09 moosebuild

Job Test qsubs sawtooth on ca0493d : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/cluster_tests/AdaptiveSobol/test_parallel_adaptive_sobol

moosebuild avatar Oct 03 '22 14:10 moosebuild

Hm, mac failed with:

(    0.14 sec) Job Handler              : DEBUG           -> Initializing ray locally with num_cpus:  4
2022-09-30 14:55:25,799	ERROR node.py:742 -- Unable to succeed in selecting a random port.
Traceback (most recent call last):
  File "/Users/civet/civet/build_0/raven/raven_framework.py", line 26, in <module>
    sys.exit(main(True))
  File "/Users/civet/civet/build_0/raven/ravenframework/Driver.py", line 203, in main
    raven()
  File "/Users/civet/civet/build_0/raven/ravenframework/Driver.py", line 155, in raven
    simulation.initialize()
  File "/Users/civet/civet/build_0/raven/ravenframework/Simulation.py", line 543, in initialize
    self.jobHandler.initialize()
  File "/Users/civet/civet/build_0/raven/ravenframework/JobHandler.py", line 140, in initialize
    self.__initializeRay()
  File "/Users/civet/civet/build_0/raven/ravenframework/JobHandler.py", line 224, in __initializeRay
    self.rayServer = ray.init(num_cpus=int(self.runInfoDict['totalNumCoresUsed']),include_dashboard=db) if _rayAvail else \
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/worker.py", line 1420, in init
    _global_node = ray._private.node.Node(
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/node.py", line 267, in __init__
    self._ray_params.update_pre_selected_port()
  File "/Users/civet/.conda/envs/raven_libraries/lib/python3.8/site-packages/ray/_private/parameter.py", line 326, in update_pre_selected_port
    raise ValueError(
ValueError: Ray component dashboard_agent_grpc is trying to use a port number 65534 that is used by other components.
Port information: {'gcs': 'random', 'object_manager': 'random', 'node_manager': 'random', 'gcs_server': 65534, 'client_server': 'random', 'dashboard': 'random', 'dashboard_agent_grpc': 65534, 'dashboard_agent_http': 52365, 'metrics_export': 65535, 'redis_shards': 'random', 'worker_ports': 'random'}
If you allocate ports, please make sure the same port is not used by multiple components.

Running test failed with exit code -15
(678F1/819) Failed (  7.69sec)tests/framework/InternalParallelTests/ROMscikit

joshua-cogliati-inl avatar Oct 03 '22 14:10 joshua-cogliati-inl

Job Test mac on ca0493d : invalidated by @joshua-cogliati-inl

ValueError: Ray component dashboard_agent_grpc is trying to use a port number 65534 that is used by other components.

moosebuild avatar Oct 03 '22 14:10 moosebuild

Job Test Ubuntu 16 on 743a8f0 : invalidated by @joshua-cogliati-inl

This is a \test\for\Everything

moosebuild avatar Oct 03 '22 15:10 moosebuild

Job Test Ubuntu 16 on 743a8f0 : canceled by @joshua-cogliati-inl

f: \ stuff

moosebuild avatar Oct 03 '22 15:10 moosebuild

Job Test Ubuntu 16 on 743a8f0 : invalidated by @joshua-cogliati-inl

Restarting.

moosebuild avatar Oct 03 '22 16:10 moosebuild

Job Test Ubuntu 16 on ca0493d : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Oct 04 '22 18:10 moosebuild

Windows failed.

wangcj05 avatar Oct 05 '22 21:10 wangcj05

Job Test Fedora 31 on 6bbda33 : invalidated by @joshua-cogliati-inl

FAILED: Diff tests/framework/PostProcessors/EconomicRatio/timeDepDataset

moosebuild avatar Oct 05 '22 22:10 moosebuild

Job Mingw Test on 6bbda33 : invalidated by @wangcj05

testing

moosebuild avatar Nov 15 '22 18:11 moosebuild

Job Test Ubuntu 18 PIP on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Ubuntu 18 PIP on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test CentOS 8 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Fedora 31 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Fedora 32 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Ubuntu 16 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Ubuntu 18-2 Python 3 on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Ubuntu 20-2 Optional on 164e372 : invalidated by @joshua-cogliati-inl

restarted civet

moosebuild avatar Nov 21 '22 18:11 moosebuild

Job Test Fedora 31 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

moosebuild avatar Nov 21 '22 20:11 moosebuild

Job Test Fedora 32 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

moosebuild avatar Nov 21 '22 20:11 moosebuild

Job Test Ubuntu 16 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

moosebuild avatar Nov 21 '22 20:11 moosebuild

Job Test Ubuntu 20-2 Optional on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

moosebuild avatar Nov 21 '22 20:11 moosebuild

Job Test Ubuntu 18-2 Python 3 on 164e372 : invalidated by @joshua-cogliati-inl

mystery failure

moosebuild avatar Nov 21 '22 20:11 moosebuild

checklist is good, and tests are green. PR can be merged.

wangcj05 avatar Nov 28 '22 16:11 wangcj05