He Jia issues

Results 32 issues of


                                            He Jia

May TFRT provide an image that can be easily deployed? How to compile a TensorFlow wheel with TFRT?

The official TF image does not contain compiled TFRT.

How preallocate buffer through rendezvous protocol before ucp_tag_recv_nbx actually receiving?

It seams there is no api about rendezvous in UCP. And there is no clear document or example for UCT.

Is there any benchmark of P2P communication between NCCL and UCX(ucp)?

Including PCI-E, RDMA, TCP/IP and other scenarios, I do not know what kind of test is appropriate.

How can I use nvc++ toolchain?

Seems only clang and nvcc support in rules_cuda

What‘s the best practice for intra-node GPU communication when using Active Message?

There are 8 cards in one node. Should I create the endpoints for the rest of 7 cards in GPUx? Or I need to use different methods when intra-node GPU...

How to get pending tasks number in UCP worker queue?

I'm not sure if I understand this correctly. It can't submit too many nbx to the UCP worker, or it will cause ucp_worker_progress to process too slowly. So is there...

How to use UCX with io_uring?

I noticed there are some APIs about epoll in UCX. It's possible to use io_uring?

What's the C++ compile and calling API of tilelang kernel?

In XLA, it can be used PJRT API to access XLA kernel in CPP code, which is the implementation of PyTorch XLA backend. It's there any way to access Tile-lang...

Does UCP support send IOV data from different device?

For example, one from GPU, and the other from Host.

run @hedron_compile_commands//:refresh_all fail when use rules_foreign_cc cmake

INFO: Invocation ID: 016e1cd5-c232-40d3-9c1d-dcf15ee690c3 WARNING: Build options --features and --host_features have changed, discarding analysis cache (this can be expensive, see https://bazel.build/advanced/performance/iteration-speed). INFO: Analyzed target @@hedron_compile_commands~//:refresh_all (0 packages loaded, 3516 targets...