agent-lightning icon indicating copy to clipboard operation
agent-lightning copied to clipboard

No deployments available for selected model

Open Hlcnn opened this issue 1 month ago • 4 comments

Thank you for your work. I am running the spider and calc_x tasks, and after specifying a local model, I consistently encounter the errors "No deployments available for selected model" or "The model Qwen/Qwen2.5-0.5B-Instruct does not exist." My agentlightning is installed from the latest source, with verl version 0.6.0 and vllm version 0.10.2. The full traceback points to a litellm.types.router.RouterRateLimitError and a subsequent NotFoundError related to the model's availability or deployment via the LiteLLM router. Could you please advise on the proper configuration steps for using this local model, as the current setup seems to be failing to connect or recognize the deployment?

ERROR: Exception occured - No deployments available for selected model, Try again in 5 seconds. Passed model=Qwen/Qwen2.5-0.5B-Instruct. pre-call-checks=False, cooldown_list=['3dfdf3fc7bce8b4337b0c174251a6cff311d9f85b7afb2d3214bdbdfecd3b907'] Traceback (most recent call last): File "litellm/proxy/proxy_server.py", line 4782, in chat_completion result = await base_llm_response_processor.base_process_llm_request( File "litellm/proxy/common_request_processing.py", line 502, in base_process_llm_request responses = await llm_responses File "litellm/router.py", line 1093, in acompletion raise e File "litellm/router.py", line 1069, in acompletion response = await self.async_function_with_fallbacks(**kwargs) File "litellm/router.py", line 4037, in async_function_with_fallbacks return await self.async_function_with_fallbacks_common_utils( File "litellm/router.py", line 3995, in async_function_with_fallbacks_common_utils raise original_exception File "litellm/router.py", line 4029, in async_function_with_fallbacks response = await self.async_function_with_retries(*args, **kwargs) File "litellm/router.py", line 4151, in async_function_with_retries self.should_retry_this_error( File "litellm/router.py", line 4350, in should_retry_this_error raise error File "litellm/router.py", line 4125, in async_function_with_retries response = await self.make_call(original_function, *args, **kwargs) File "litellm/router.py", line 4245, in make_call response = await response File "litellm/router.py", line 1372, in _acompletion raise e File "litellm/router.py", line 1246, in _acompletion deployment = await self.async_get_available_deployment( File "litellm/router.py", line 7337, in async_get_available_deployment raise e File "litellm/router.py", line 7229, in async_get_available_deployment healthy_deployments = await self.async_get_healthy_deployments( File "litellm/router.py", line 7179, in async_get_healthy_deployments raise exception litellm.types.router.RouterRateLimitError: No deployments available for selected model, Try again in 5 seconds. Passed model=Qwen/Qwen2.5-0.5B-Instruct. pre-call-checks=False, cooldown_list=['3dfdf3fc7bce8b4337b0c174251a6cff311d9f85b7afb2d3214bdbdfecd3b907'] WARNING: Tried calling set_status on an ended span. WARNING: Calling end() on an ended span. Error in chat_completion_stream_wrapper: Error code: 404 - {'error': {'message': 'litellm.NotFoundError: NotFoundError: Hosted_vllmException - The model Qwen/Qwen2.5-0.5B-Instruct does not exist.. Received Model Group=Qwen/Qwen2.5-0.5B-Instruct\nAvailable Model Group Fallbacks=None', 'type': None, 'param': None, 'code': '404'}} Image

Hlcnn avatar Nov 11 '25 03:11 Hlcnn

I think it's related to local model path. How do you use the local model? What's the configuration path etc.?

ultmaster avatar Nov 11 '25 08:11 ultmaster

I think it's related to local model path. How do you use the local model? What's the configuration path etc.?

As I showed in the figure, based on the source code, I only specified the model's path attribute in RL_TRAINING_CONFIG to be the model I downloaded from Hugging Face. Do I need any additional configuration? For example, for vLLM? This issue has also been encountered by two other classmates of mine, specifically in the Spider and Calc_X projects.

"model": { "path": "/home/admin/workspace/aop_lab/app_source/models/Qwen/Qwen2.5-0.5B-Instruct", "use_remove_padding": True, "enable_gradient_checkpointing": True, },

Hlcnn avatar Nov 11 '25 09:11 Hlcnn

I get the same problem. However I modified ver/trainer/413 to examine which one of original model path or truncated model path works, neither of them works. So I guess the problem is not model path.

XianglongTan avatar Nov 12 '25 03:11 XianglongTan

We might need to maintain a local model example on CI. Putting it into backlog. Stay tuned.

ultmaster avatar Nov 12 '25 04:11 ultmaster