serving icon indicating copy to clipboard operation
serving copied to clipboard

Preferred / cheapest way to host TF Serving In the cloud?

Open supermoos opened this issue 3 years ago • 4 comments

I'm wondering if you have some recommendations as to how I can host this in the cloud in a cost effective manner. I've tried Vertex AI Model endpoint on Google Cloud, it's nice that it's easy to setup and add good GPU acceleration but it seems like it requires 1 instance and 1 gpu to always be allocated which quickly adds up to a cost of around 250$ a month because it runs 24/7/30.

Are you able to provide recommendations on a setup / service / platform where it can still run fast (probably needs gpu acceleration) but can spin down when not actively used. For my use case it might be that I need the endpoint to serve predictions every 5-15 minutes, and sometimes it may not be needed for 12 hours at all.

Thanks!

supermoos avatar Nov 25 '21 19:11 supermoos

So I figured out the AI Platform has very basic support for some instances at the global endpoint to only run when requested, however those machine types are so limited in resources that my 246 mb model runs out of resources on them. And the n-xxx machine types to do not support scaling down to 0. Still haven't found a good cheap way to run in the cloud yet, just leaving this info here for other interested parties.

supermoos avatar Dec 01 '21 08:12 supermoos

Kubernetes with the smallest machine that can handle your model + using spot instances should do the job.

piEsposito avatar Dec 02 '21 18:12 piEsposito

Does what Pi suggested work for your use case? Here is the link for Spot VMs: https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms

guanxinq avatar Dec 23 '21 16:12 guanxinq

I haven't testes spot vm's yet. But Amazon launched Serverless inference as a preview service and it works very well so far.

supermoos avatar Dec 23 '21 17:12 supermoos

@supermoos,

As Serverless Inference Amazon SageMaker works for you, please let us know it this issue can be closed.

You can also try cluster or node pool with Spot VMs on GCP for cheap solution. Ref: Spot VMs

Thank you!

singhniraj08 avatar Jan 18 '23 10:01 singhniraj08