Thorsten Glüsenkamp

Results 18 comments of Thorsten Glüsenkamp

Cool I will check that out , thanks :).. maybe it would be helpful to link to this from the README, it is not straight forward to find this in...

But this is mere a workaround, right? Maybe some of the developers (@rkooo567) can chime in?

@jjyao no the headnodes and worker nodes run on different machines with a shared filesystem. The temp_dir is only defined on the headnode in ray start --temp_dir=... , doing the...

@ericl I am running version 2.2 and when using ray.tune it seems the worker processes on the worker nodes are not killed, and memory is not deallocated. I.e. when i...

I have written a blog post on the forums (https://discuss.ray.io/t/gpu-memory-is-not-freed-on-cluster-after-ctrl-c-can-i-respond-to-specific-errors-from-within-a-client-node/8835) that is related. Maybe the issue is how I set up my cluster? I do not run locally. I have...

I started it again. `ray status` yields ``` ======== Autoscaler status: 2023-01-04 14:18:31.968221 ======== Node status --------------------------------------------------------------- Healthy: 1 node_90a4e5e792e7a584e78fbdcd53f7f2abe2350e653f0d2b219265c892 1 node_a9a07f99061855c976a9f37194d3c8fb847c0364bf204e2ac9e07b37 1 node_6194a7ac8d34bdccf408ddb625ddc997488aeffcf370b8307437d595 1 node_46794be6eba5371181ed88a3956006129912129033bba23edb1b88d8 1 node_5d967084754cc90b3bc40f5f11b1d7287f3779b458585244462b24cc Pending: (no...

can you say what you are looking for? The output string is pretty long and I dont want to post it here necessarily.

Yes, so I see a bunch of ImplicitFunc:Train and ImplicitFunc:IDLE processes, and those actually do not get killed by ctrl+x.. in particular I get more and more of those remaining...

Ok I started #31451. By the way I tried to run your above script but I get the error: ``` 2023-01-05 00:14:22,528 INFO worker.py:1529 -- Started a local Ray instance....