amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard
[Example Request] Example to host llama 13b on sagemaker using neurons
https://github.com/aws/amazon-sagemaker-examples-community/blob/main/torchserve/inf2/llama2/llama-2-13b.ipynb
Could create an endpoint as above for llama 13b base, but it gives a timeout error on container primary for 13b chat.
For above, created the neuron artifacts for the 13b chat model using this - https://github.com/pytorch/serve/blob/master/examples/large_models/inferentia2/llama2/Readme.md?plain=1#L56 Could start torchserve and run inference via curl command here, so the model artifacts look okay. But the same artifacts won't work in the first notebook reference link.