Stephanie Wang
Stephanie Wang
> > Btw, are we going with "Ray cluster" instead of "Ray Cluster"? > > I've been going with `Ray cluster`, wouldn't be surprised if missed something though. Ah okay,...
Hmm I think root cause is actually that a node failed.
``` 2022-08-09 09:45:50,528 WARNING worker.py:1799 -- The node with node id: 870b6809c74ad48e2e8711c9385c2fef9fda5d43bc0462564f7dfc15 and address: 10.0.3.49 and node name: 10.0.3.49 has been marked dead because the detector has missed too many...
Hmm I would vote for making this a release blocker: 1. The test has not been flaky in the past. Took a look at previous failures and the errors don't...
> What's the relationship between this one and the standalone Observability section? > > It's meant to supplement that page by explaining what should be done on a Ray Cluster...
Docs-only changes, not waiting for windows tests.
I'm hoping @c21 or @clarkzinzow can take the lead on it.
I'm afraid Ray is not expected to work if you specify too small of a range of worker ports and the ports are all in use. Your script will slow...
I see, thanks! I've updated the issue title and we'll look into it, but I can't guarantee a timeline. So I strongly suggest that you find a way to work...