Jinyan Chen comments

Results 13 comments of


                                            Jinyan Chen

Support mixtral long sequence 32k with bs 4

Break PR https://github.com/huggingface/optimum-habana/pull/836 into small pieces, based on PR https://github.com/huggingface/optimum-habana/pull/901

Support mixtral long sequence 32k with bs 4

> @jychen-habana , please test rope_scaling with Mixtral and update the results here. **Run with rope_scaling (add below to config.json):** `"rope_scaling": {"type":"linear","factor":2.0},` **Test case: --max_input_tokens 32000 --bucket_size 1024 --max_new_tokens 512...

Support mixtral long sequence 32k with bs 4

@regisss @libinta @mandy-li please help review and merge this PR, thanks!

Support mixtral long sequence 32k with bs 2

Break PR https://github.com/huggingface/optimum-habana/pull/836 into small pieces, based on PR https://github.com/huggingface/optimum-habana/pull/898

Support mixtral long sequence 32k with bs 2

**Add test case input_32000 output_512** **Command (with --limit_hpu_graphs, --reuse_cache, --bucket_internal, --bucket_size 256 --max_new_tokens 512)** ``` QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-v0.1 --use_hpu_graphs --limit_hpu_graphs --use_kv_cache --reuse_cache --bucket_internal --bucket_size 256 --max_new_tokens 512 --bf16...

Support mixtral long sequence 32k with bs 2

**Add test case input_32000 output_700** **Command (with --limit_hpu_graphs, --reuse_cache, --bucket_internal, --bucket_size 256 --max_new_tokens 700)** ``` QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-v0.1 --use_hpu_graphs --limit_hpu_graphs --use_kv_cache --reuse_cache --bucket_internal --bucket_size 256 --max_new_tokens 700 --bf16...

Jinyan Chen

Support mixtral long sequence 32k with bs 4

Support mixtral long sequence 32k with bs 4

Support mixtral long sequence 32k with bs 4

Support mixtral long sequence 32k with bs 2

Support mixtral long sequence 32k with bs 2

Support mixtral long sequence 32k with bs 2

Support mixtral long sequence 32k with bs 2

Support mixtral long sequence 32k with bs 2

Update Mixtral-8x7B Optimization

Update Mixtral-8x7B Optimization