Mandy Li comments

Results 7 comments of


                                            Mandy Li

Support mixtral long sequence 32k with bs 4

@jychen-habana , please test rope_scaling with Mixtral and update the results here.

Fix RoPE data type issue for gpt_neox and stablelm (#177)

@regisss , good point. i didn't know this jira when i worked on the type casting. The reason why we have this PR is because our QA reported perf regression...

Fix RoPE data type issue for gpt_neox and stablelm (#177)

created https://github.com/huggingface/optimum-habana/pull/999, close this one

Update Mixtral-8x7B Optimization

@schoi-habana , please provide details of how you optimized Falcon-180b fp8 for Jinyan to follow to add to this model. thanks

Update Mixtral-8x7B Optimization

@jychen-habana , please post the performance measurements with/without this PR here.

Update Mixtral-8x7B Optimization

@jychen-habana , please rebase to latest code in OH main branch

Update Mixtral-8x7B Optimization

@jychen-habana , this PR doesn't work with Synapse 1.15 release docker when measurement. QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_generation.py --model_name_or_path /mnt/weka/data/mixtral/models--mistralai--Mixtral-8x7B-Instruct-v0.1/snapshots/1e637f2d7cb0a9d6fb1922f305cb784995190a83/ --use_hpu_graphs --use_kv_cache --limit_hpu_graphs --bucket_size 128 --max_new_tokens 128 --batch_size 1 --bf16 Error: File...