DeepSpeed
DeepSpeed copied to clipboard
chatglm-6b can not use deepspeed inference[BUG]
Model * GPU size memory required for tensor parallel inference and it does not reduce latency Are there any plans to support it?