FastChat
FastChat copied to clipboard
how to serve Gradio app on the cloud and run inference locally
Hi!
I'm building my own LLM, and I would like to serve it with FastChat. My idea is to deploy the Gradio App on AWS or GCP, and run the LLM inference locally from my own cluster. Is this possible, how could I setup something like this? Were would I need to run the controller?