LLM-TPU
LLM-TPU copied to clipboard
Support for Llama 3.1 model
Are there instructions specific to creating a bmodel from onnx for Llama 3.1 (not lllam3)
Running this is erroring out. python export_onnx.py --model_path ../../../../Meta-Llama-3.1-8B-Instruct/ --seq_length 1024
Convert block & block_cache
0%| | 0/32 [00:00<?, ?it/s]The attention layers in this model are transitioning from computing the RoPE embeddings internally through position_ids
(2D tensor with the indexes of the tokens), to using externally computed position_embeddings
(Tuple of tensors, containing cos and sin). In v4.45 position_ids
will be removed and position_embeddings
will be mandatory.