VS Code LM API uses too many requests
Description
When using VS Code LM API provider the Tool or assistant calls are not being identified correctly and are counting against the request quota.
To reproduce, simply make any prompt that initiates a tool call (e.g. "summarize the contents of this folder") and see excessive usage of request.
Instead, it should be setting the "X-Initiator" header to "agent"
See similar issue in opencode: https://github.com/sst/opencode/issues/430 And the fix: https://github.com/sst/opencode/pull/595
Thank you for the report! Sadly, VS Code LM API is in "very experimental" state, so I'm not surprised at all.
Which provider/model are you using?
Thank you for the report! Sadly,
VS Code LM APIis in "very experimental" state, so I'm not surprised at all.Which provider/model are you using?
![]()
The model doesn't seem to matter. I've tried GPT 4.1, o3, and Gemini Pro
I'm not seeing the same behavior. I spent much of the day today and yesterday using the LM API based mechanism with kilo and I'm not seeing anything recorded as premium request usage when hitting the included models.
I'm not seeing the same behavior. I spent much of the day today and yesterday using the LM API based mechanism with kilo and I'm not seeing anything recorded as premium request usage when hitting the included models.
Forget I said the included model GPT 4.1. You're probably right that that doesn't move the needle - this is kind of an inexact process.
I mean, I submitted literally hundreds of requests yesterday via the LM api to 4.1 and 4o models. I would have blown out my premium limit.
I mean, I submitted literally hundreds of requests yesterday via the LM api to 4.1 and 4o models. I would have blown out my premium limit.
4.1 and 4o are not counted as premium by github copilot. things like claude, gemini or gpt-5 will reduce the premium quota.
I used the Copilot Premium model Claude 4 to ask about the purpose of a simple tsconfig.json file with 26 lines, and Kilo used two premium requests. When it could have used only 1. I believe one was used for AI response, and the second for task completion. I think that task completions could be utilized with other cost-efficient models that can aggregate the results of a task. Perhaps in Kilo settings, users could set a particular model of their choice for task completion?