FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

how to serve Gradio app on the cloud and run inference locally

Open ouhenio opened this issue 1 year ago • 0 comments

Hi!

I'm building my own LLM, and I would like to serve it with FastChat. My idea is to deploy the Gradio App on AWS or GCP, and run the LLM inference locally from my own cluster. Is this possible, how could I setup something like this? Were would I need to run the controller?

ouhenio avatar Jun 17 '24 16:06 ouhenio