optimum-habana
optimum-habana copied to clipboard
[Gemma v2] Enable gemma v2 on gaudi
What does this PR do?
This patch aims to enable gemma2 both training and inferencing on gaudi device. Also static shape support is enabled as well. The performance results are shown as below:
python run_generation.py --model_name_or_path ../../../gemma-2-9b/ --bf16 --use_kv_cache --reuse_cache --use_hpu_graphs --max_input_tokens 128 --max_new_tokens 128 --bf16 --batch_size 4
Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1: ('DeepSpeed is a machine learning framework that enables training of large-scale deep learning models on a single GPU or across multiple GPUs. It is designed to be easy to use and highly scalable, making it a powerful tool for researchers and practitioners working with large-scale deep learning models.\n\nDeepSpeed is built on top of PyTorch, a popular deep learning framework, and provides a set of tools and libraries that make it easy to train large-scale models. It includes features such as zero-shot inference, which allows models to be used for inference without the need for retraining, and distributed training, which enables models to be trained across multiple GPUs.\n\nDeepSpeed is also',)
Stats:
--------------------------------------------------------------------------------------------------------------
Throughput (including tokenization) = 347.9053767194258 tokens/second
Number of HPU graphs = 20
Memory allocated = 20.74 GB
Max memory allocated = 20.82 GB
Total memory available = 94.62 GB
Graph compilation duration = 5.919518291004351 seconds
--------------------------------------------------------------------------------------------------------------
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?