mlx-llm-server
mlx-llm-server copied to clipboard
code 501, message Unsupported method ('GET')
pip install mlx-llm-server
Works fine via curl.
But some apps use a request to get a list of available models. The same as OpenAI's API does. This causes an issue and most apps respond with a message similar to: No Models found This could mean that the connection is not configured correctly or that the vendor did not return any models. If applicable, make sure that CORS is enabled on the vendor's side.
mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2" Fetching 11 files: 100%|██████████████████████████████████████████| 11/11 [00:00<00:00, 225060.21it/s] Starting httpd at 127.0.0.1 on port 8080... 127.0.0.1 - - [01/Mar/2024 15:50:45] "POST /v1/chat/completions HTTP/1.1" 200 - 127.0.0.1 - - [01/Mar/2024 15:52:09] "POST /v1/chat/completions HTTP/1.1" 200 - 127.0.0.1 - - [01/Mar/2024 15:56:01] "OPTIONS /api/tags HTTP/1.1" 204 - 127.0.0.1 - - [01/Mar/2024 15:56:01] code 501, message Unsupported method ('GET') 127.0.0.1 - - [01/Mar/2024 15:56:01] "GET /api/tags HTTP/1.1" 501 -
Thoughts?
Thanks
It looks like that the app you used requires the GET endpoint of /api/tags
, but I couldn't find any API specifications on OpenAI's API for that endpoint. So, I suspect it is something specific to the app.
I could not find that endpoint in OpenAI's API docs either. But this is a common request that I see in the logs for Ollama and LM Studio, but after the apps I using have a value from that GET request it's just used as a label for the model in use.
After that, all of the requests are to POST /v1/chat/completions endpoint.
Perhaps I can clone this repo just to handle this GET request and return any/some model name.
My use case is mostly for NovelCrafter and a few RAG setups, which connect to a local chat/inference model such as Mistral or Mixtral or Westlake. This saves me money instead of using OpenAI API or Claude, which gets expensive.
I do want to use my new macbook more and with MLX ... so thanks for your work.
hey there, just to weigh in on the conversation, since this is an issue caused by my app: the tags endpoint is specific to ollama, not openai. However, is there a way to add support for the OpenAI models endpoint? My app relies on the server returning a list of the model(s) people can call. even if it's just the one being loaded on start, that's fine.
I think it should be easy to add models list endpoint https://platform.openai.com/docs/api-reference/models/list
By me cloning the repo or will you be adding this? Either way is fine, thanks.
Yes, I can add it if I have some time tomorrow.
I copied the inaccurate logs before, it should have been the openai and not ollama, like this:
mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"
Fetching 11 files: 100%|███████████████████████████████████| 11/11 [00:00<00:00, 56471.66it/s] Starting httpd at 127.0.0.1 on port 8080... 127.0.0.1 - - [02/Mar/2024 09:11:26] "OPTIONS /v1/models HTTP/1.1" 204 - 127.0.0.1 - - [02/Mar/2024 09:11:26] code 501, message Unsupported method ('GET') 127.0.0.1 - - [02/Mar/2024 09:11:26] "GET /v1/models HTTP/1.1" 501 -
... this GET, as you guys pointed out, is in the OpenAI docs/specs. Thanks for doing this, let me know as I will be glad to test it out too.
I copied the inaccurate logs before, it should have been the openai and not ollama, like this:
mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"
Fetching 11 files: 100%|███████████████████████████████████| 11/11 [00:00<00:00, 56471.66it/s] Starting httpd at 127.0.0.1 on port 8080... 127.0.0.1 - - [02/Mar/2024 09:11:26] "OPTIONS /v1/models HTTP/1.1" 204 - 127.0.0.1 - - [02/Mar/2024 09:11:26] code 501, message Unsupported method ('GET') 127.0.0.1 - - [02/Mar/2024 09:11:26] "GET /v1/models HTTP/1.1" 501 -
... this GET, as you guys pointed out, is in the OpenAI docs/specs. Thanks for doing this, let me know as I will be glad to test it out too.
Would you try running pip install -U mlx-llm-server
to see if the updated version fixes the issue?
It did fix the GET request and it now returns the correct model id/name, but then this happens:
mlx-llm-server --model "mistralai/Mistral-7B-Instruct-v0.2"
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<00:00, 193043.28it/s]
Starting httpd at 127.0.0.1 on port 8080...
127.0.0.1 - - [02/Mar/2024 16:09:10] "OPTIONS /v1/models HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "GET /v1/models HTTP/1.1" 200 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "OPTIONS /v1/models HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:09:10] "GET /v1/models HTTP/1.1" 200 -
127.0.0.1 - - [02/Mar/2024 16:11:23] "OPTIONS /v1/chat/completions HTTP/1.1" 204 -
127.0.0.1 - - [02/Mar/2024 16:11:23] "POST /v1/chat/completions HTTP/1.1" 200 -
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 55758)
Traceback (most recent call last):
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
self.process_request(request, client_address)
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 347, in process_request
self.finish_request(request, client_address)
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 360, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/socketserver.py", line 747, in __init__
self.handle()
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/http/server.py", line 433, in handle
self.handle_one_request()
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/http/server.py", line 421, in handle_one_request
method()
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/mlx_llm_server/app.py", line 206, in do_POST
response = self.handle_post_request(post_data)
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/mlx_llm_server/app.py", line 217, in handle_post_request
prompt = _tokenizer.apply_chat_template(
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1742, in apply_chat_template
rendered = compiled_template.render(
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/environment.py", line 1301, in render
self.environment.handle_exception()
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/environment.py", line 936, in handle_exception
raise rewrite_traceback_stack(source=source)
File "<template>", line 1, in top-level template code
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/jinja2/sandbox.py", line 393, in call
return __context.call(__obj, *args, **kwargs)
File "/Users/cleesmith/anaconda3/envs/apple_mlx/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1776, in raise_exception
raise TemplateError(message)
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
----------------------------------------
Perhaps this is related to the app novelcrafter, so more research is needed, as this still works properly:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer no-key" \
-d '{
"model": "mistral",
"stop":["<|im_end|>"],
"messages": [
{
"role": "user",
"content": "hi, who are you?"
}
]
}'
{"id": "chatcmpl-33dbad95-d0b4-4a53-b1b4-5a41915ca421", "object": "chat.completion", "created": 1709414336, "model": "mistral", "system_fingerprint": "fp_e721a08c-73ce-40f8-913e-2417e204f144", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello! I'm an AI language model, designed to help answer questions and assist with various tasks. I don't have a physical form or identity, but I'm here to help you in any way I can. How can I assist you today?</s>"}, "logprobs": null, "finish_reason": null}], "usage": {"prompt_tokens": 14, "completion_tokens": 54, "total_tokens": 68}}%
But thank you for your efforts, and here's hoping the added GET will be helpful to others.
The error occurred because the Mistral chat template doesn't support system prompts. However, this error shouldn't cause the request to fail; it should just be a warning. If you try another model that supports system prompts, the error will disappear.