ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[WIP][Infer] Inference Distributed RPC Framework Optimization

Open LRY89757 opened this issue 1 year ago • 0 comments

  1. Optimize the data path: from List->CPU Tensor->List->rpc_param->GPU Tensor to List->rpc_param->GPU Tensor
  2. Wrap the async forward only once
  3. Only rank0 Worker runs the sampler and returns the return value
  4. Pass the rpc param to worker 0 instead of all workers, and worker 0 broadcast the param to all workers using NCCL.

The performance is not good enough, which needs to be further optimized

LRY89757 avatar May 27 '24 07:05 LRY89757