LLM-TPU Support for Llama 3.1 model

Support for Llama 3.1 model

Open sbmandava opened this issue 7 months ago • 2 comments

Are there instructions specific to creating a bmodel from onnx for Llama 3.1 (not lllam3)

Running this is erroring out. python export_onnx.py --model_path ../../../../Meta-Llama-3.1-8B-Instruct/ --seq_length 1024

Convert block & block_cache 0%| | 0/32 [00:00<?, ?it/s]The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids (2D tensor with the indexes of the tokens), to using externally computed position_embeddings (Tuple of tensors, containing cos and sin). In v4.45 position_ids will be removed and position_embeddings will be mandatory.

Jul 28 '24 15:07 sbmandava

LLM-TPU LLM-TPU copied to clipboard

Support for Llama 3.1 model

LLM-TPU
LLM-TPU copied to clipboard