amazon-sagemaker-examples icon indicating copy to clipboard operation
amazon-sagemaker-examples copied to clipboard

[Example Request] Example to host llama 13b on sagemaker using neurons

Open sayli-ds opened this issue 7 months ago • 0 comments

https://github.com/aws/amazon-sagemaker-examples-community/blob/main/torchserve/inf2/llama2/llama-2-13b.ipynb

Could create an endpoint as above for llama 13b base, but it gives a timeout error on container primary for 13b chat.

For above, created the neuron artifacts for the 13b chat model using this - https://github.com/pytorch/serve/blob/master/examples/large_models/inferentia2/llama2/Readme.md?plain=1#L56 Could start torchserve and run inference via curl command here, so the model artifacts look okay. But the same artifacts won't work in the first notebook reference link.

sayli-ds avatar Nov 22 '23 17:11 sayli-ds