Yiltan
Yiltan
The recv-side worker fails with the following if I call `ucp_put_nbx()` I've error checked `ucp_mem_map`, `ucp_rkey_pack`, `ucp_ep_rkey_unpack`, etc Any suggestions on why this error could occur? (this is my own...
### Describe the bug I get around ~12000MB/s for inter-node GPU->GPU data transfers on a ConnectX6 200Gbit. I get around ~24000MB for the same test using host memory. Should the...
I'm working on the C version of the code in preparation for (#40) So llm.c with **no** code modifications I observe the following: - `test_gpt2` works successfully and the loss...
## What Implements the function `uct_cuda_ipc_rkey_ptr`. ## Why ? So that we can call `ucp_rkey_ptr` on CUDA memory. This would allow us to write to a remote processes GPU's memory...
- Added `--map-by numa` flag - Added `--timeout` flag - Added environment variable to enable/disable get tests
This bug fix is in develop (#31) but it has not been incorporated into ROCm 6.4.x
Cherry picked from #13295 (commit aa5577441ff1ab7f97f8b63e442b37457c7bd997)
## Motivation Users want a tool compare performance between version X and version Y of our code. ## Design A python matplotlib script that can be used to compare the...