intel-extension-for-transformers
intel-extension-for-transformers copied to clipboard
lm-eval for llama.cpp enhancement.
Type of Change
enable lm-eval for llama.cpp models
API not changed
Description
refer to the lm-eval official code and llama-cpp-python
improvements:
- load llama.cpp model directly when do lm-eval (the official code needs launch a llama.cpp server)
- For qwen models, revise the detokenize func because some error occurs during evaluation and force to add
bos_id
for qwen models because thellama-cpp-python
doesn't addbos_id
successfully. Even though some changes for qwen, I still find that the tokenizer results are different between llama.cpp and huggingface/transformers. I will verify this further. - As describe in the comments at
llama-cpp-python
, I implement it with a custom class, which can accelerate the post-process.
⛈️ Required checks status: Has failure 🔴
Warning If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.
Groups summary
🔴 Format Scan Tests workflow
Check ID | Status | Error details | |
---|---|---|---|
format-scan (pylint) | failure | download | ❌ |
format-scan (bandit) | success | ✅ | |
format-scan (cloc) | success | ✅ | |
format-scan (cpplint) | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py
, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py
.
🔴 Optimize Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
optimize-unit-test-baseline | success | ✅ | |
optimize-unit-test-PR-test | failure | download | ❌ |
Genreate-OptimizeUT-Report | skipped | ❓ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py
, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py
.
🟢 NeuralChat Unit Test
Check ID | Status | Error details | |
---|---|---|---|
neuralchat-unit-test-baseline | success | ✅ | |
neuralchat-unit-test-PR-test | success | ✅ | |
Generate-NeuralChat-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py
, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py
.
🟢 Engine Unit Test workflow
Check ID | Status | Error details | |
---|---|---|---|
engine-unit-test-baseline | success | ✅ | |
engine-unit-test-PR-test | success | ✅ | |
Genreate-Engine-Report | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py
, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py
.
🟢 Chat Bot Test workflow
Check ID | Status | Error details | |
---|---|---|---|
call-inference-llama-2-7b-chat-hf / inference test | success | ✅ | |
call-inference-mpt-7b-chat / inference test | success | ✅ |
These checks are required after the changes to intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/__init__.py
, intel_extension_for_transformers/transformers/llm/evaluation/lm_eval/models/llama_cpp_lm.py
.
Thank you for your contribution! 💜
Note This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.
usages:
CPU
model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
device = "cpu",
tasks = "hellaswag",
batch_size = 2,
limit = 10)
results = evaluate(eval_args)
print(results["results"])
GPU
model_name = "Qwen/Qwen1.5-0.5B-Chat-GGUF"
from intel_extension_for_transformers.transformers.llm.evaluation.lm_eval import evaluate, LMEvalParser
eval_args = LMEvalParser(model = "gguf-custom",
model_args='pretrained=' + model_name + ',ftype=' + '*q4_0.gguf',
device = "cuda",
tasks = "hellaswag",
batch_size = 2,
limit = 10)
results = evaluate(eval_args)
print(results["results"])