kuberay
kuberay copied to clipboard
[Bug] Ray component worker_ports is trying to use a port number 12345 that is used by other components.
Search before asking
- [X] I searched the issues and found no similar issues.
KubeRay Component
ray-operator
What happened + What you expected to happen
Version:
- kuberay nightly
- Ray 1.9.2
It's interesting that 12345 is reserved by object manager but worker_ports doesn't exclude it. This is the first time I launch 1.9.2 cluster and it reports this problem but seem it's unrelated.
➜ kuberay git:(master) ✗ k logs -f raycluster-mini-head-7zdcl
2022-02-14 01:44:32,093 INFO scripts.py:612 -- Local node IP: 10.1.17.126
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1989, in main
return cli()
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 633, in start
ray_params, head=True, shutdown_at_exit=block, spawn_reaper=block)
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/node.py", line 211, in __init__
self._ray_params.update_pre_selected_port()
File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/parameter.py", line 316, in update_pre_selected_port
f"Ray component {comp} is trying to use "
ValueError: Ray component worker_ports is trying to use a port number 12345 that is used by other components.
Port information: {'gcs': [6379], 'object_manager': [12345], 'node_manager': [12346], 'gcs_server': [], 'client_server': [10001], 'dashboard': [8265], 'dashboard_agent': [65114], 'metrics_export': [59389], 'redis_shards': [], 'worker_ports': [10002, 10003, 10004, 10005, 10006, 10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 10016, 10017, 10018, 10019, 10020, 10021, 10022, 10023, 10024, 10025, 10026, 10027, 10028, 10029, 10030, 10031, 10032, 10033, 10034, 10035, 10036, 10037, 10038, 10039, 10040, 10041, 10042, 10043, 10044, 10045, 10046, 10047, 10048, 10049, 10050, 10051, 10052, 10053, 10054, 10055, 10056, 10057, 10058, 10059, 10060, 10061, 10062, 10063, 10064, 10065, 10066, 10067, 10068, 10069, 10070, 10071, 10072, 10073, 10074, 10075, 10076, 10077, 10078, 10079, 10080, 10081, 10082, 10083, 10084, 10085, 10086, 10087, 10088, 10089, 10090, 10091, 10092, 10093, 10094, 10095, 10096, 10097, 10098, 10099, 10100, 10101, 10102, 10103, 10104, 10105, 10106, 10107, 10108, 10109, 10110, 10111, 10112, 10113, 10114, 10115, 10116, 10117, 10118, 10119, 10120, 10121, 10122, 10123, 10124, 10125, 10126, 10127, 10128, 10129, 10130, 10131, 10132, 10133, 10134, 10135, 10136, 10137, 10138, 10139, 10140, 10141, 10142, 10143, 10144, 10145, 10146, 10147, 10148, 10149, 10150, 10151, 10152, 10153, 10154, 10155, 10156, 10157, 10158, 10159, 10160, 10161, 10162, 10163, 10164, 10165, 10166, 10167, 10168, 10169, 10170, 10171, 10172, 10173, 10174, 10175, 10176, 10177, 10178, 10179, 10180, 10181, 10182, 10183, 10184, 10185, 10186, 10187, 10188, 10189, 10190, 10191, 10192, 10193, 10194, 10195, 10196, 10197, 10198, 10199, 10200, 10201, 10202, 10203, 10204, 10205, 10206, 10207, 10208, 10209, 10210, 10211, 10212, 10213, 10214, 10215, 10216, 10217, 10218, 10219, 10220, 10221, 10222, 10223, 10224, 10225, 10226, 10227, 10228, 10229, 10230, 10231, 10232, 10233, 10234, 10235, 10236, 10237, 10238, 10239, 10240, 10241, 10242, 10243, 10244, 10245, 10246, 10247, 10248, 10249, 10250, 10251, 10252, 10253, 10254, 10255, 10256, 10257, 10258, 10259, 10260, 10261, 10262, 10263, 10264, 10265, 10266, 10267, 10268, 10269, 10270, 10271, 10272, 10273, 10274, 10275, 10276, 10277, 10278, 10279, 10280, 10281, 10282, 10283, 10284, 10285, 10286, 10287, 10288, 10289, 10290, 10291, 10292, 10293, 10294, 10295, 10296, 10297, 10298, 10299, 10300, 10301, 10302, 10303, 10304, 10305, 10306, 10307, 10308, 10309, 10310, 10311, 10312, 10313, 10314, 10315, 10316, 10317, 10318, 10319, 10320, 10321, 10322, 10323, 10324, 10325, 10326, 10327, 10328, 10329, 10330, 10331, 10332, 10333, 10334, 10335, 10336, 10337, 10338, 10339, 10340, 10341, 10342, .....
19922, 19923, 19924, 19925, 19926, 19927, 19928, 19929, 19930, 19931, 19932, 19933, 19934, 19935, 19936, 19937, 19938, 19939, 19940, 19941, 19942, 19943, 19944, 19945, 19946, 19947, 19948, 19949, 19950, 19951, 19952, 19953, 19954, 19955, 19956, 19957, 19958, 19959, 19960, 19961, 19962, 19963, 19964, 19965, 19966, 19967, 19968, 19969, 19970, 19971, 19972, 19973, 19974, 19975, 19976, 19977, 19978, 19979, 19980, 19981, 19982, 19983, 19984, 19985, 19986, 19987, 19988, 19989, 19990, 19991, 19992, 19993, 19994, 19995, 19996, 19997, 19998, 19999]}
If you allocate ports, please make sure the same port is not used by multiple components.
Reproduction script
- install operator
kubectl apply -k "github.com/ray-project/kuberay/manifests/cluster-scope-resources"
kubectl apply -k "github.com/ray-project/kuberay/manifests/base"
- create a cluster
update image to 1.9.2.
kubectl apply -f "kuberay/ray-operator/config/samples/ray-cluster.mini.yaml"
Anything else
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
I think the error comes from https://github.com/ray-project/kuberay/blob/de21b1f2d360f8c6a5b02dc3df91e4214c5cc802/ray-operator/config/samples/ray-cluster.mini.yaml#L23-L24
Once I remove manual allocation, it works fine. I am not sure if that's an upstream issue, I think even with reserved ports, worker_ports should be discovered correctly. Do you have context on this one? @chenk008
@Jeffwan Sorry for delay reply, the min_worker_port default value 10002 results this error. Maybe we should delete port config in all rayStartParams ? They can generated randomly.
I do not observe this issue anymore. Close this issue.