Logan Adams comments

Results 466 comments of


                                            Logan Adams

[BUG] oneapi/ccl.hpp: No such file or directory.

@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on...

[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11

@shag1802 - can you share your shm size if using docker at all?

[BUG] Using and Building DeepSpeedCPUAdam

Hi @oabuhamdan - can you summarize the state of this, is there a bug that needs more debugging, or do we think this is something perhaps unique to your setup/cuda/torch/hw?

[BUG] Using and Building DeepSpeedCPUAdam

Thanks for the quick summary @oabuhamdan - I'll test this on my side as well. Though I believe this runs currently in the nv-pre-compile-ops workflow, so this may be setup...

[BUG] Using and Building DeepSpeedCPUAdam

@oabuhamdan - thanks for clarifying, I forgot that our node for that wasn't using GPUs, I'll work on getting a repro and will share my results here.

[BUG] Using and Building DeepSpeedCPUAdam

Hi @oabuhamdan - can you share your `LIBRARY_PATH` and `LD_LIBRARY_PATH` env vars? We've found for some users they don't have these set properly, samples below ``` export LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LIBRARY_PATH" export LD_LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH"...

Logan Adams

[BUG] oneapi/ccl.hpp: No such file or directory.

[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11

[BUG] Using and Building DeepSpeedCPUAdam

[BUG] Using and Building DeepSpeedCPUAdam

[BUG] Using and Building DeepSpeedCPUAdam

[BUG] Using and Building DeepSpeedCPUAdam

[BUG] Using and Building DeepSpeedCPUAdam

[BUG] Using and Building DeepSpeedCPUAdam

[REQUEST]How to set Ulysses in deepspeed config json?

[BUG] Multi-node fine-tuning with thunderbolt