Mandy Li

Results 7 comments of Mandy Li

@jychen-habana , please test rope_scaling with Mixtral and update the results here.

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression...

created https://github.com/huggingface/optimum-habana/pull/999, close this one

@schoi-habana , please provide details of how you optimized Falcon-180b fp8 for Jinyan to follow to add to this model. thanks

@jychen-habana , please post the performance measurements with/without this PR here.

@jychen-habana , please rebase to latest code in OH main branch

@jychen-habana , this PR doesn't work with Synapse 1.15 release docker when measurement. QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py --model_name_or_path /mnt/weka/data/mixtral/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/ --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 128 --batch_size 1 --bf16 Error: File...