jan
jan copied to clipboard
idea: get the current loaded model(s) through API endpoint
Problem Statement
Is it possible to get the current loaded model(s) in Jan through API endpoint? The current "http://localhost:1337/v1/models" API shows all "available" models.
Feature Idea
Provide a new API endpoint if necessary.
Jan is beautifully built. The following is copied from OpenAI docs for your quick reference. In my experience integrating other LLM Servers/Gateways, the API returns "currently" available models. As soon as this functionality is enhanced in Jan, I'd be happy to add a demo for Jan on GPTLocalhost so that users have another reason to give Jan a try. Thank you for your consideration.
Lists the currently available models, and provides basic information about each one such as the owner and availability.
UPDATE:
Jan should expose the current loaded models endpoint (with better API path)
/inferences/server/models
/inferences/server/models
Thanks for the update. May I confirm whether it has been implemented? I tried the latest version (v0.5.16) but found the above API does not exist.
Hi @GPTLocalhost I will take a look and sneak in the update to the upcoming next release. For now it's supported but not exposed for public access. (API Key - cortex.cpp restrcited)
It should be supported via this endpoint. Unfortunately, there's a bug introduced when we filter out undownloaded models from the hub. cc @samhvw8 curl http://localhost:1337/inferences/server/models -H "Authorization:...
Thank you for the update. Unfortunately, our attempt yesterday was unsuccessful. Once the bug is resolved, we intend to produce a demo showcasing how to use Jan with Microsoft Word as planned.
Additionally, we're curious about any future plans to integrate Apple Intelligence within Jan. We already have a similar demo as below and would appreciate it if Jan could access Apple's Foundation Language Models.
https://youtu.be/BBr2gPr-hwA
It should be supported via this endpoint. Unfortunately, there's a bug introduced when we filter out undownloaded models from the hub. cc @samhvw8 curl http://localhost:1337/inferences/server/models -H "Authorization:...
@samhvw8 can you verify if this endpoint works? If so, pls close this issue. If not, please turn this into a bug and add it to your backlog 🙏
@freelerobot yes, it current bug right now, let me fix it soon
Hi @GPTLocalhost can I check with you if this issue still presence?
"http://localhost:1337/v1/models" works now, with API key. Thank you very much. However, I tested "/completions" using gpt-oss-20b and it shows the following. Is there any way to filter out the tags and reasoning traces?
<|channel|>analysis<|message|>We need to rewrite a simple test message. It's a very short text: "This is a test message." We need to rewrite it to be more simple, clear, better flow. But it's already simple. Maybe rewrite as "This is a test." Or "Here is a test message." But the instruction: rewrite the following text to make it more simple, clear, with better flow. So we can produce a simple, clear sentence. Let's produce: "This is a test message." Already simple. Maybe "This is a test." The content might remain same meaning. Provide alternative? Provide rewrite. We'll deliver.<|start|>assistant<|channel|>final<|message|>This is a test.
hey @GPTLocalhost if you use through API layer, then it is up to you to have to parse that information because llama.cpp upstream doesn't properly implement the reasoning_content portion yet.
But thanks for the update, I will close this bug now
I see. Thank you for the info.