chenglimin
chenglimin
> 只是在debian镜像里会出现这个问题,busybox制作的文件系统里执行应用程序时不会出现这个问题的。
> command: **python -m fastchat.serve.cli --model-path ./model_weights/lmsys/vicuna-7b-delta-v1.1 --load-8bit** > > error info: OutOfMemoryError: CUDA out of memory. Tried to allocate 86.00 MiB (GPU 0; 6.00 GiB total capacity; 4.01 GiB...
I convert the model and predictor of Falcon-40B into PowerInfer GGUF as mentioned in your README, and keep the directory as you show in README. However, it come with the...
Here are content in your "requirements.txt": "numpy>=1.24.4 sentencepiece>=0.1.98 transformers>=4.33.2 -e ./gguf-py -e ./powerinfer-py" Here are my package versions: "numpy 1.26.2 sentencepiece 0.1.99 transformers 4.36.2 " > Can you confirm that...
My PyTorch version is also 2.1.2, as shown in the following picture. And when I run with LLaMa-13B model,this problem never appear. > I tested code around the error shown...
> k what is the output lengt Can PowerInfer support API server execution mode?
What are the dtype and activation function of Falcon-40b and OPT-30B when you evaluate vLLM on A100? As far as I know, vLLM does not support Relu, where do you...
where can I download the predictor of opt-30B model?
> > where can I download the predictor of opt-30B model? > > We have not released the predictor of OPT models yet. The sparse inference implementation (code + predictor)...
what is the input-length of Figure 13 for PC-high in your paper?