jyc comments

Results 32 comments of

jyc

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

Current theories which I'm testing on the prod server: 1. Nx is not freeing VRAM, CUDA runs out of memory, starts showing us these errors (although I think we'd get...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

Thanks! I'm not running in IEx but inside of a Phoenix web app—some processes that call Nx live for a long time (hours), but they don't hold references. I've tried...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

Thanks! I am indeed currently using the EXLA compiler. If you think the CUDA_ERROR_INVALID_VALUE bug is more likely to be an out-of-memory issue than an Nx bug, is there a...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

Thinking out loud— I took a look at how you can examine memory usage in JAX, and it looks like their `heap_profile` function just gets all the live PyArrays, gets...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

> I'd try to use whatever XLA provides for memory tracking per client I don't think XLA provides anything. JAX's `heap_profile` function in `py_client.cc` code that I linked uses `LiveArrays()`...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

I checked to see how XLA's Memory Profile Tool works; it consumes NVIDIA CUDA Tools Profiling Interface (CUPTI) events, turning them into XPlane events (?!) and then reading those events,...

Nx on CUDA stops working: "Command buffer has to have a graph executable to be updated"

Hm, I just ran into the original bug again. I think manually calling `:erlang.garbage_collect` in the processes that were serving requests helped; I had to remove `Nx.backend_deallocate` because it would...

Windows return to their original position after snapping in macOS 15.3

Hm good idea; I don't know what else could be running, but I restarted my computer and things work now. Sorry for the noise and thanks again for making Rectangle!

Cannot compile exla on Mac

I ran into this as well after upgrading macOS. @Joss-Steward's idea of vendoring to disable the warning makes sense. In case it helps others: in my case my `mix.exs` already...

kamal deploy hangs indefinitely after "Pull app image" finishes

Hm. So for a run that didn't hang I actually see more output before "Acquiring the deploy lock": ``` Run kamal deploy --skip-push --version=hash Pull app image... INFO [e7a736ab] Running...