How to convert ProSparse-LLaMA-2-13B model to .gguf?
Prerequisites
Before submitting your question, please ensure the following:
- [x] I am running the latest version of PowerInfer. Development is rapid, and as of now, there are no tagged versions.
- [x] I have carefully read and followed the instructions in the README.md.
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
Question Details
Please provide a clear and concise description of your question. If applicable, include steps to reproduce the issue or behaviors you've observed.
Additional Context
Please provide any additional information that may be relevant to your question, such as specific system configurations, environment details, or any other context that could be helpful in addressing your inquiry.
python convert.py --outfile ./llama_convert/prosparse-llama-2-13b.powerinfer.gguf ./prosparse-llama-2-13b ./prosparse-llama-2-13b-predictor
Model architecture True is not supported by this convert.py. Trying with convert-hf-to-powerinfer-gguf.py...
Loading model: prosparse-llama-2-13b
Traceback (most recent call last):
File "/root/autodl-tmp/powerinfer/PowerInfer/convert-hf-to-powerinfer-gguf.py", line 609, in
Hello. Have you solve this bug ?
try to replace the config.json file in your /path/to/model with the vllm version detail info: https://huggingface.co/SparseLLM/prosparse-llama-2-13b
_Here are the steps to adapting the original vLLM to ProSparse models.
Replace the file vllm/model_executor/models/llama.py in original vLLM with this file. Replace the contents of the original config.json with this file. Set the environment variable ACT_INFO. To test the version without activation threshold shifting, export ACT_INFO=relu. To test the version with activation threshold shifting, export ACT_INFO=fatrelu_0.01._
the vllm version of config.json set the model_architecture as "LlamaForCausalLM", which is supported by convert-hf-to-powerinfer-gguf.py