Eric Liang
Eric Liang
I'm particularly wondering if those processes using the GPU are still remaining in the ps aux list after "ray status" is reporting all GPUs have been released, and also the...
Ok, so it seems Ray isn't properly terminating those processes (without a manual cluster stop). The only challenge is figuring out why then. It would be helpful to figure out...
Btw, it may be helpful to file a new issue, then we can bump the priority--- this bug is a bit unrelated to the original issue topic.
Thanks! It should be connecting automatically to the latest Ray cluster (at least of recent Ray versions). You can force a connection by adding ray.init("auto") to the start of the...
Going to close this issue so we can track in https://github.com/ray-project/ray/issues/31451
Some progress: now passing a good portion of dataset tests: ``` FAILED test_dataset.py::test_bulk_lazy_eval_split_mode[False] - AssertionError: (ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000001000000), BlockMetadata(num_rows=None, size_bytes=10, schema... FAILED test_dataset.py::test_bulk_lazy_eval_split_mode[True] - AssertionError: (ObjectRef(00ffffffffffffffffffffffffffffffffffffff0100000009000000), BlockMetadata(num_rows=None, size_bytes=10, schema=... FAILED test_dataset.py::test_basic_actors[False] -...
Alright. Let's leave the TODO to fix this test (or we can fix this test by increasing the min pool size).
To be clear, this is a bug. The lineage refs should be freed because the loop throws away all refs on each iteration. But yeah we should also show this...
@iycheng it's probably some bug in the order of deleting the refs? Like if `ref` gets deleted first it's fine, but not if `data` is deleted first? That would explain...
I guess it's fine.