[Core] `ray.wait` not actually wait until ready when the task is longer than 12 days
What happened + What you expected to happen
For a task longer than 12 days, ray.wait will return an empty list of ready object refs after 10**6 seconds when timeout is not specified, which is about 11.5 days.
This is inconsistent with what ray.get will do when timeout is not specified.
Versions / Dependencies
ray==2.9.3 (but I suppose it happens for all the ray versions) python 3.10 OS Ubuntu 20.04
Reproduction script
https://github.com/ray-project/ray/blob/1ccf9254c16d2cb0237fba5aa0a511c1177181c9/python/ray/_private/worker.py#L2852-L2853
Issue Severity
None
Hi @Michaelvll what's the cluster setup. Does the task run on the same node where ray.wait is called?
Hi @Michaelvll what's the cluster setup. Does the task run on the same node where ray.wait is called?
Yes, the task is run on the same node as the driver, but I believe this happens for multi-node cases as well, due to the code quoted above. ray.get does not have the issue.
If it is always set to 10**6 seconds, we probably keep it as is and not break any compatibility.
It makes sense to have some default timeout like that so that api call would not hang forever. Nonetheless, we should change the docs to mention this.