UncleFB

Results 6 comments of UncleFB

我想知道,deepspeed的推理如何和自己的服务集成到一起呢,这样直接通过命令跑一个脚本肯定没啥意义吧

It took seven minutes for the model to start loading. But no matter if I set tensor_parallel to 2 or 4, OOM will occur. Isn’t the model loaded on multiple...

@mrwyattii Thank you for your help. We have 8 24G GPUs, but it seems that I cannot specify which gpu to use by specifying CUDA_VISIBLE_DEVICES.

Thanks again. Another question, why does it take a long time before loading my local model. I can keep seeing the log of waiting for the service to start, and...

> > @mrwyattii Thank you for your help. We have 8 24G GPUs, but it seems that I cannot specify which gpu to use by specifying CUDA_VISIBLE_DEVICES. > > Please...

我也在多卡推理的时候报错了,chatglm2报错,llama2好像没问题 /lightllm/lightllm/models/chatglm2/layer_infer/transformer_layer_infer.py:30: UserWarning: An output with one or more elements was resized since it had shape [6, 128], which does not match the required output shape [6, 256]. This behavior...