Jinyan Chen

Results 3 issues of Jinyan Chen

# What does this PR do? Support long sequences 32k with **bs4** (move Q slicing inside loop to save memory) **Before** OutofMemory **After** **Basic Command (with --limit_hpu_graphs, --reuse_cache, --bucket_internal, --batch_size...

# What does this PR do? Support long sequences like 32000 with max **bs as 2** (slicing Q in attention). **Short sequences** **Comand** ``` QUANT_CONFIG=./quantization_config/maxabs_quant_mixtral.json python run_generation.py --model_name_or_path mistralai/Mixtral-8x7B-v0.1 --use_hpu_graphs...

# What does this PR do? - Update Mixtral-8x7B Optimization: reuse_cache / enable FP8 KV Cache / FP8 Attn / bucket_internal ... - Support long sequence prompt ```shell QUANT_CONFIG=./quantization_config/maxabs_measure.json python...