He Jia
He Jia
The official TF image does not contain compiled TFRT.
It seams there is no api about rendezvous in UCP. And there is no clear document or example for UCT.
Including PCI-E, RDMA, TCP/IP and other scenarios, I do not know what kind of test is appropriate.
Seems only clang and nvcc support in rules_cuda
There are 8 cards in one node. Should I create the endpoints for the rest of 7 cards in GPUx? Or I need to use different methods when intra-node GPU...
I'm not sure if I understand this correctly. It can't submit too many nbx to the UCP worker, or it will cause ucp_worker_progress to process too slowly. So is there...
I noticed there are some APIs about epoll in UCX. It's possible to use io_uring?
In XLA, it can be used PJRT API to access XLA kernel in CPP code, which is the implementation of PyTorch XLA backend. It's there any way to access Tile-lang...
For example, one from GPU, and the other from Host.
INFO: Invocation ID: 016e1cd5-c232-40d3-9c1d-dcf15ee690c3 WARNING: Build options --features and --host_features have changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed). INFO: Analyzed target @@hedron_compile_commands~//:refresh_all (0 packages loaded, 3516 targets...