Mandy Li
Mandy Li
@jychen-habana , please test rope_scaling with Mixtral and update the results here.
@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression...
created https://github.com/huggingface/optimum-habana/pull/999, close this one
@schoi-habana , please provide details of how you optimized Falcon-180b fp8 for Jinyan to follow to add to this model. thanks
@jychen-habana , please post the performance measurements with/without this PR here.
@jychen-habana , please rebase to latest code in OH main branch
@jychen-habana , this PR doesn't work with Synapse 1.15 release docker when measurement. QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py --model_name_or_path /mnt/weka/data/mixtral/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/ --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 128 --batch_size 1 --bf16 Error: File...