ray-llm
ray-llm copied to clipboard
Possible to run on a single 8x A100 machine on-premise?
I would like to run a single machine that is on-premise, but not able to get the models to load as it is looking for actor/worker resource nodes that don't exist. Do you have any config example for single machine on-premise?
Aviary requires a Ray Cluster to run. You can set up an on-premise Ray Cluster (https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/on-premises.html). Because Aviary uses Ray Custom Resources to ensure that each model is scheduled on an intended GPU type, you will need to set those in both the Ray cluster configuration and Aviary model yamls.
You can edit the EC2 config to use on-prem instead with your desired node type.
Alternatively, if you just want to experiment, you can do the following:
- SSH into your GPU node,
- load the docker image/install Aviary locally with
pip install -e ".[backend, frontend]"
- edit the
scaling_config
section in model configuration and change theaccelerator_type_[TYPE]
toaccelerator_type_a100
- start ray with
ray start --resources "{\"accelerator_type_a100\": 1}"
(the actual number of GPUs will be detected automatically) - start aviary with
aviary run --model model_yaml_with_edited_scaling_config.yaml
This will start a Ray cluster composed of just this single node.
Perfect, thank you. Got it all working along with the frontend in a docker container. One problem I encountered was that both the frontend and backend default to port 8000, so the front end needed to be started like this: serve run --host 0.0.0.0 --port 7860 aviary.frontend.app:app
@Yard1 what do you think about making the frontend run on port 7860 by default to be consistent with normal Gradio and not cause this problem?
I think that's a good idea!