[WIP][Infer] Inference Distributed RPC Framework Optimization

Open LRY89757 opened this issue 1 year ago • 0 comments

Optimize the data path: from List->CPU Tensor->List->rpc_param->GPU Tensor to List->rpc_param->GPU Tensor
Wrap the async forward only once
Only rank0 Worker runs the sampler and returns the return value
Pass the rpc param to worker 0 instead of all workers, and worker 0 broadcast the param to all workers using NCCL.

The performance is not good enough, which needs to be further optimized

May 27 '24 07:05 LRY89757