UncleFB comments

Results 6 comments of


                                            UncleFB

deepspeed inference 显存问题

我想知道，deepspeed的推理如何和自己的服务集成到一起呢，这样直接通过命令跑一个脚本肯定没啥意义吧

How to load my local model

It took seven minutes for the model to start loading. But no matter if I set tensor_parallel to 2 or 4, OOM will occur. Isn’t the model loaded on multiple...

How to load my local model

@mrwyattii Thank you for your help. We have 8 24G GPUs, but it seems that I cannot specify which gpu to use by specifying CUDA_VISIBLE_DEVICES.

How to load my local model

Thanks again. Another question, why does it take a long time before loading my local model. I can keep seeing the log of waiting for the service to start, and...

How to load my local model

> > @mrwyattii Thank you for your help. We have 8 24G GPUs, but it seems that I cannot specify which gpu to use by specifying CUDA_VISIBLE_DEVICES. > > Please...

我也在多卡推理的时候报错了，chatglm2报错，llama2好像没问题 /lightllm/lightllm/models/chatglm2/layer_infer/transformer_layer_infer.py:30: UserWarning: An output with one or more elements was resized since it had shape [6, 128], which does not match the required output shape [6, 256]. This behavior...

UncleFB

deepspeed inference 显存问题

How to load my local model

How to load my local model

How to load my local model

How to load my local model

使用多卡加载模型，推理时报错