Logan Adams

Results 466 comments of Logan Adams

@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on...

@shag1802 - can you share your shm size if using docker at all?

Hi @oabuhamdan - can you summarize the state of this, is there a bug that needs more debugging, or do we think this is something perhaps unique to your setup/cuda/torch/hw?

Thanks for the quick summary @oabuhamdan - I'll test this on my side as well. Though I believe this runs currently in the nv-pre-compile-ops workflow, so this may be setup...

@oabuhamdan - thanks for clarifying, I forgot that our node for that wasn't using GPUs, I'll work on getting a repro and will share my results here.

Hi @oabuhamdan - can you share your `LIBRARY_PATH` and `LD_LIBRARY_PATH` env vars? We've found for some users they don't have these set properly, samples below ``` export LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LIBRARY_PATH" export LD_LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH"...

Thanks @oabuhamdan, that's what I thought but wanted to check on this. Since other issues, like #5659 have similar errors but different signatures, we wanted to isolate things, this shouldn't...

Okay @oabuhamdan - we've merged a first PR that we believe doesn't resolve your issue, but wanted to ensure if you could check again and confirm it doens't?

Hi @xs1997zju are you asking how to enable using Deepspeed Ulysses?

Hi @Raywang0211 - that's very interesting, but the lack of error messages will make this hard to debug. Are you able to confirm ssh connections work fine normally/without DeepSpeed?