github-mcp-server icon indicating copy to clipboard operation
github-mcp-server copied to clipboard

Excessive context size for `list_commits`: response object has 5-6KB per commit

Open bocytko opened this issue 8 months ago • 14 comments

Describe the bug

When list_commits() returns 30 results, the follow-up LLM call will use >64k tokens for repos like zalando/skipper. This is prone to exceed the context size of LLMs. Users will also run into rate limits from LLM providers:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4o in organization org-xxx on tokens per min (TPM): Limit 30000, Requested 68490. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

I'm a tier-1 API user for OpenAI.

Affected version

server version v0.1.0 (b89336793c5bc9b9abdd5100d876babbc1031f5d) 2025-04-04T15:38:21

Steps to reproduce the behavior

  1. Type Who is the most frequent committer in github/github-mcp-server? Use list_commits for the output.
  2. Function call executed (due to #136 the actual result set has 30 items).
list_commits({
  "owner": "github",
  "repo": "github-mcp-server",
  "perPage": 100
})
  1. 30 commits fetched will exceed context size of the model and/or run into API rate limits making this function unusable.

Expected vs actual behavior

list_commits() should apply field filtering on the Github API response. Currently all sorts of data for author, committer, commit.verification (incl. signature) is returned which could be optional.

Logs

N/A

bocytko avatar Apr 06 '25 22:04 bocytko

I think this is a general issue with all the tools, I've used list_pull_requests with 14 open pull requests, the context window jumped to 142k token, burned 0.6$ with claude sonnet. After inspecting the result response, it return 384 lines per pull request (too many metadata fields). It would be nice to have control over the returned fields per tool.

cufeo avatar Apr 07 '25 06:04 cufeo

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

bocytko avatar Apr 07 '25 12:04 bocytko

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

Yes, this is indeed an issue. We are considering several options:

  1. The server can curate the fields that are returned as there some fields that are not useful at all.
  2. Provide lite functions where fewer fields are returned
  3. Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

juruen avatar Apr 07 '25 13:04 juruen

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

Yes, this is indeed an issue. We are considering several options:

  1. The server can curate the fields that are returned as there some fields that are not useful at all.
  2. Provide lite functions where fewer fields are returned
  3. Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

Option 3 is the most flexible, but I guess it requires passing the full schema to the model which can also burn tokens quickly. Another option is to let the user select the fields during the mcp setup, and default to minimum fields.

cufeo avatar Apr 07 '25 13:04 cufeo

Yes, this is indeed an issue. We are considering several options:

  1. The server can curate the fields that are returned as there some fields that are not useful at all.

Good first step in my view as many links and detailed fields can be dropped.

  1. Provide lite functions where fewer fields are returned

If this causes the number of functions to double, then 1 is better.

  1. Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

Would it help if the custom instructions passed to the LLM included a planning step that lists the fields required to perform the query? This set of fields could then extend the default. Otherwise, I wonder if the models would already know how to query github as the github API docs are most likely part of the training data. The LLM could generate GraphQL queries for the user request, which are then verified, extended, and executed by the MCP server.

bocytko avatar Apr 09 '25 21:04 bocytko

This is mostly an issue with context window, so I have changed the label - we are actively discussing how we can make this better, so I'll leave this open and we will revisit when we are ready to offer solutions.

SamMorrowDrums avatar Apr 23 '25 13:04 SamMorrowDrums

If it's a blocker for anyone, meanwile you can switch tooling and instruct the llm to use the gh-cli and then filter the response through jq to effectively limit what ends up in the context window.

ThomSMG avatar May 16 '25 04:05 ThomSMG

This issue is stale because it has been open for 60 days with no activity. Leave a comment to avoid closing this issue in 10 days.

github-actions[bot] avatar Aug 13 '25 08:08 github-actions[bot]

Large output from MCP server often pollutes the context window of LLM, so still an issue and not a stale one

gauravkumar37 avatar Aug 13 '25 09:08 gauravkumar37

This issue is stale because it has been open for 60 days with no activity. Leave a comment to avoid closing this issue in 120 days.

github-actions[bot] avatar Oct 14 '25 08:10 github-actions[bot]

Still valid.

bocytko avatar Oct 15 '25 13:10 bocytko

Agreed, I hope we can get to this soon!

SamMorrowDrums avatar Oct 15 '25 21:10 SamMorrowDrums

This issue is stale because it has been open for 30 days with no activity. Leave a comment to avoid closing this issue in 60 days.

github-actions[bot] avatar Nov 15 '25 08:11 github-actions[bot]

Still valid.

bocytko avatar Nov 16 '25 14:11 bocytko