ray-llm
ray-llm copied to clipboard
Cannot specify models using yaml as a list of dicts or LLMApp objects
Hello. It doesn't work to specify models directly in serve config. But if you define args.models as a list of yaml filepath it works. Example:
(base) ray@raycluster-llm-head-rv27x:~/serve_configs$ cat meta-llama--Llama-2-7b-chat-hf-full.yaml
http_options:
host: 0.0.0.0
applications:
- name: ray-llm
route_prefix: /
import_path: rayllm.backend:router_application
args:
models:
- deployment_config:
autoscaling_config:
min_replicas: 1
initial_replicas: 1
max_replicas: 8
target_num_ongoing_requests_per_replica: 24
metrics_interval_s: 10.0
look_back_period_s: 30.0
smoothing_factor: 0.5
downscale_delay_s: 300.0
upscale_delay_s: 15.0
max_concurrent_queries: 64
ray_actor_options:
num_cpus: 2
engine_config:
model_id: meta-llama/Llama-2-7b-chat-hf
hf_model_id: meta-llama/Llama-2-7b-chat-hf
type: VLLMEngine
engine_kwargs:
trust_remote_code: true
max_num_batched_tokens: 4096
max_num_seqs: 64
gpu_memory_utilization: 0.95
max_total_tokens: 4096
generation:
prompt_format:
system: "<<SYS>>\n{instruction}\n<</SYS>>\n\n"
assistant: " {instruction} </s><s>"
trailing_assistant: ""
user: "[INST] {system}{instruction} [/INST]"
system_in_user: true
default_system_message: ""
stopping_sequences: ["<unk>"]
scaling_config:
num_workers: 1
num_gpus_per_worker: 1
num_cpus_per_worker: 2
placement_strategy: "STRICT_PACK"
resources_per_worker:
(base) ray@raycluster-llm-head-rv27x:~/serve_configs$ serve run meta-llama--Llama-2-7b-chat-hf-full.yaml
2023-12-27 13:44:15,621 INFO scripts.py:418 -- Running config file: 'meta-llama--Llama-2-7b-chat-hf-full.yaml'.
2023-12-27 13:44:15,626 INFO worker.py:1458 -- Connecting to existing Ray cluster at address: 10.233.86.89:6379...
2023-12-27 13:44:15,635 INFO worker.py:1633 -- Connected to Ray cluster. View the dashboard at 10.233.86.89:8265
(ServeController pid=4989) INFO 2023-12-27 13:44:16,554 controller 4989 application_state.py:183 - Recovering target state for application 'ray-llm' from checkpoint.
(HTTPProxyActor pid=5029) INFO 2023-12-27 13:44:17,438 http_proxy 10.233.86.89 http_proxy.py:1433 - Proxy actor 3c92662700463eea872f1cfd16000000 starting on node d9722feee6b275abc23643cae31fe546b1dd6ba3187a250e67318a8c.
(HTTPProxyActor pid=5029) INFO 2023-12-27 13:44:17,445 http_proxy 10.233.86.89 http_proxy.py:1617 - Starting HTTP server on node: d9722feee6b275abc23643cae31fe546b1dd6ba3187a250e67318a8c listening on port 8000
2023-12-27 13:44:17,475 SUCC scripts.py:514 -- Submitted deploy config successfully.
(ServeController pid=4989) INFO 2023-12-27 13:44:17,472 controller 4989 application_state.py:374 - Starting build_serve_application task for application 'ray-llm'.
(HTTPProxyActor pid=5029) INFO: Started server process [5029]
(build_serve_application pid=5062) [WARNING 2023-12-27 13:44:21,054] api.py: 382 DeprecationWarning: `route_prefix` in `@serve.deployment` has been deprecated. To specify a route prefix for an application, pass it into `serve.run` instead.
(ServeController pid=4989) WARNING 2023-12-27 13:44:21,159 controller 4989 application_state.py:663 - Deploying app 'ray-llm' failed with exception:
(ServeController pid=4989) Traceback (most recent call last):
(ServeController pid=4989) File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/application_state.py", line 909, in build_serve_application
(ServeController pid=4989) app = call_app_builder_with_args_if_necessary(import_attr(import_path), args)
(ServeController pid=4989) File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/serve/_private/api.py", line 377, in call_app_builder_with_args_if_necessary
(ServeController pid=4989) app = builder(args)
(ServeController pid=4989) File "/home/ray/anaconda3/lib/python3.9/site-packages/rayllm/backend/server/run.py", line 114, in router_application
(ServeController pid=4989) router_args = RouterArgs.parse_obj(args)
(ServeController pid=4989) File "pydantic/main.py", line 526, in pydantic.main.BaseModel.parse_obj
(ServeController pid=4989) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
(ServeController pid=4989) pydantic.error_wrappers.ValidationError: 7 validation errors for RouterArgs
(ServeController pid=4989) models
(ServeController pid=4989) str type expected (type=type_error.str)
(ServeController pid=4989) models
(ServeController pid=4989) value is not a valid dict (type=type_error.dict)
(ServeController pid=4989) models -> 0
(ServeController pid=4989) str type expected (type=type_error.str)
(ServeController pid=4989) models -> 0 -> engine_config -> engine_kwargs
(ServeController pid=4989) extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> generation
(ServeController pid=4989) extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> hf_model_id
(ServeController pid=4989) extra fields not permitted (type=value_error.extra)
(ServeController pid=4989) models -> 0 -> engine_config -> max_total_tokens
(ServeController pid=4989) extra fields not permitted (type=value_error.extra)
(ServeController pid=4989)