No deployments available for selected model
Thank you for your work. I am running the spider and calc_x tasks, and after specifying a local model, I consistently encounter the errors "No deployments available for selected model" or "The model Qwen/Qwen2.5-0.5B-Instruct does not exist." My agentlightning is installed from the latest source, with verl version 0.6.0 and vllm version 0.10.2. The full traceback points to a litellm.types.router.RouterRateLimitError and a subsequent NotFoundError related to the model's availability or deployment via the LiteLLM router. Could you please advise on the proper configuration steps for using this local model, as the current setup seems to be failing to connect or recognize the deployment?
ERROR: Exception occured - No deployments available for selected model, Try again in 5 seconds. Passed model=Qwen/Qwen2.5-0.5B-Instruct. pre-call-checks=False, cooldown_list=['3dfdf3fc7bce8b4337b0c174251a6cff311d9f85b7afb2d3214bdbdfecd3b907']
Traceback (most recent call last):
File "litellm/proxy/proxy_server.py", line 4782, in chat_completion
result = await base_llm_response_processor.base_process_llm_request(
File "litellm/proxy/common_request_processing.py", line 502, in base_process_llm_request
responses = await llm_responses
File "litellm/router.py", line 1093, in acompletion
raise e
File "litellm/router.py", line 1069, in acompletion
response = await self.async_function_with_fallbacks(**kwargs)
File "litellm/router.py", line 4037, in async_function_with_fallbacks
return await self.async_function_with_fallbacks_common_utils(
File "litellm/router.py", line 3995, in async_function_with_fallbacks_common_utils
raise original_exception
File "litellm/router.py", line 4029, in async_function_with_fallbacks
response = await self.async_function_with_retries(*args, **kwargs)
File "litellm/router.py", line 4151, in async_function_with_retries
self.should_retry_this_error(
File "litellm/router.py", line 4350, in should_retry_this_error
raise error
File "litellm/router.py", line 4125, in async_function_with_retries
response = await self.make_call(original_function, *args, **kwargs)
File "litellm/router.py", line 4245, in make_call
response = await response
File "litellm/router.py", line 1372, in _acompletion
raise e
File "litellm/router.py", line 1246, in _acompletion
deployment = await self.async_get_available_deployment(
File "litellm/router.py", line 7337, in async_get_available_deployment
raise e
File "litellm/router.py", line 7229, in async_get_available_deployment
healthy_deployments = await self.async_get_healthy_deployments(
File "litellm/router.py", line 7179, in async_get_healthy_deployments
raise exception
litellm.types.router.RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds. Passed model=Qwen/Qwen2.5-0.5B-Instruct. pre-call-checks=False, cooldown_list=['3dfdf3fc7bce8b4337b0c174251a6cff311d9f85b7afb2d3214bdbdfecd3b907']
WARNING: Tried calling set_status on an ended span.
WARNING: Calling end() on an ended span.
Error in chat_completion_stream_wrapper: Error code: 404 - {'error': {'message': 'litellm.NotFoundError: NotFoundError: Hosted_vllmException - The model Qwen/Qwen2.5-0.5B-Instruct does not exist.. Received Model Group=Qwen/Qwen2.5-0.5B-Instruct\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '404'}}
I think it's related to local model path. How do you use the local model? What's the configuration path etc.?
I think it's related to local model path. How do you use the local model? What's the configuration path etc.?
As I showed in the figure, based on the source code, I only specified the model's path attribute in RL_TRAINING_CONFIG to be the model I downloaded from Hugging Face. Do I need any additional configuration? For example, for vLLM? This issue has also been encountered by two other classmates of mine, specifically in the Spider and Calc_X projects.
"model": { "path": "/home/admin/workspace/aop_lab/app_source/models/Qwen/Qwen2.5-0.5B-Instruct", "use_remove_padding": True, "enable_gradient_checkpointing": True, },
I get the same problem. However I modified ver/trainer/413 to examine which one of original model path or truncated model path works, neither of them works. So I guess the problem is not model path.
We might need to maintain a local model example on CI. Putting it into backlog. Stay tuned.