PowerInfer icon indicating copy to clipboard operation
PowerInfer copied to clipboard

How to OPT model with PowerInfer?

Open wuooo339 opened this issue 11 months ago • 7 comments

Prerequisites

Before submitting your question, please ensure the following:

  • [x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
  • [x] I have carefully read and followed the instructions in the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).

Question Details

I have read your article of PowerInfer and I have seen you use OPT-30B to compare with llama.cpp.But I can not read any information about OPT in this READMD?

Additional Context

I want to test OPT model using PowerInfer.So I might need your help? I am from Harbin Institude of Technology studying HPC(high performance computing).And I am trying different offloading strategies recently.

wuooo339 avatar Dec 24 '24 09:12 wuooo339

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

jeremyyx avatar Dec 24 '24 10:12 jeremyyx

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned.

Hi, I want to know when the OPT model and its related code are expected to be released?

Ryuukinn55 avatar Jan 07 '25 13:01 Ryuukinn55

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned. @YixinSong-e Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

wuooo339 avatar Feb 24 '25 02:02 wuooo339

Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the OPT related code as soon as possible. Stay tuned. @YixinSong-e Now I have seen the code for OPT models but how to get the predictor of OPT and convert it to use PowerInfer?

@wuooo339 @Ryuukinn55 Hi everyone! Our code for the OPT model has been officially released. Our predictor is now available on HuggingFace: https://huggingface.co/PowerInfer/OPT-7B-predictor. For other model sizes, such as 13B or larger, we will release the predictors soon, within the next few days.

You can convert the model from the original version at https://huggingface.co/facebook/opt-6.7b using the convert.py script. First, download the model, and then run the following command:

python convert.py --outfile /PATH/TO/POWERINFER/GGUF/REPO/MODELNAME.powerinfer.gguf /PATH/TO/ORIGINAL/MODEL /PATH/TO/PREDICTOR

For any other questions, please feel free to ask!

AliceRayLu avatar Feb 24 '25 05:02 AliceRayLu

@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9 which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used offload_ffn_split: applying augmentation to model - please wait ...

wuooo339 avatar Mar 14 '25 09:03 wuooo339

@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9 which opt-6.7b.powerinfer.gguf is convert from https://huggingface.co/PowerInfer/OPT-7B-predictor and https://huggingface.co/facebook/opt-6.7b.

llm_load_gpu_split_with_budget: error: activation files under '/share-data/wzk-1/model/powerinfer/activation' not found llm_load_gpu_split: error: failed to generate gpu split, an empty one will be used offload_ffn_split: applying augmentation to model - please wait ...

sorry ,I found the activation in the predictor file.And the model after convertion should be put in the same file.

wuooo339 avatar Mar 14 '25 11:03 wuooo339

Hello, I have a few questions.

In the convert.py file, at line 1215, I noticed that if the model_type is not Llama or Bamboo, a message saying "Trying with 'convert-hf-to-powerinfer-gguf.py'" is printed. (And in fact, the class for the OPT model is defined in the convert-hf-to-powerinfer-gguf.py file.)

Given this, I’m wondering—although the example script specifies using convert.py for running the OPT model, shouldn’t we actually be using convert-hf-to-powerinfer-gguf.py instead?

Additionally, separate from that, I’d also like to ask: what is the difference between convert.py and convert-hf-to-powerinfer-gguf.py?

jieon814 avatar Jul 23 '25 08:07 jieon814