amazon-sagemaker-examples [Example Request] Example to host llama 13b on sagemaker using neurons

[Example Request] Example to host llama 13b on sagemaker using neurons

Open sayli-ds opened this issue 7 months ago • 0 comments

https://github.com/aws/amazon-sagemaker-examples-community/blob/main/torchserve/inf2/llama2/llama-2-13b.ipynb

Could create an endpoint as above for llama 13b base, but it gives a timeout error on container primary for 13b chat.

For above, created the neuron artifacts for the 13b chat model using this - https://github.com/pytorch/serve/blob/master/examples/large_models/inferentia2/llama2/Readme.md?plain=1#L56 Could start torchserve and run inference via curl command here, so the model artifacts look okay. But the same artifacts won't work in the first notebook reference link.

Nov 22 '23 17:11 sayli-ds

amazon-sagemaker-examples amazon-sagemaker-examples copied to clipboard

[Example Request] Example to host llama 13b on sagemaker using neurons

amazon-sagemaker-examples
amazon-sagemaker-examples copied to clipboard