github-mcp-server Excessive context size for `list_commits`: response object has 5-6KB per commit

Describe the bug

When list_commits() returns 30 results, the follow-up LLM call will use >64k tokens for repos like zalando/skipper. This is prone to exceed the context size of LLMs. Users will also run into rate limits from LLM providers:

openai.RateLimitError: Error code: 429 - {'error': {'message': 'Request too large for gpt-4o in organization org-xxx on tokens per min (TPM): Limit 30000, Requested 68490. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}

I'm a tier-1 API user for OpenAI.

Affected version

server version v0.1.0 (b89336793c5bc9b9abdd5100d876babbc1031f5d) 2025-04-04T15:38:21

Steps to reproduce the behavior

Type Who is the most frequent committer in github/github-mcp-server? Use list_commits for the output.
Function call executed (due to #136 the actual result set has 30 items).

list_commits({
  "owner": "github",
  "repo": "github-mcp-server",
  "perPage": 100
})

30 commits fetched will exceed context size of the model and/or run into API rate limits making this function unusable.

Expected vs actual behavior

list_commits() should apply field filtering on the Github API response. Currently all sorts of data for author, committer, commit.verification (incl. signature) is returned which could be optional.

Logs

N/A

Apr 06 '25 22:04 bocytko

I think this is a general issue with all the tools, I've used list_pull_requests with 14 open pull requests, the context window jumped to 142k token, burned 0.6$ with claude sonnet. After inspecting the result response, it return 384 lines per pull request (too many metadata fields). It would be nice to have control over the returned fields per tool.

Apr 07 '25 06:04 cufeo

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

Apr 07 '25 12:04 bocytko

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

Yes, this is indeed an issue. We are considering several options:

The server can curate the fields that are returned as there some fields that are not useful at all.
Provide lite functions where fewer fields are returned
Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

Apr 07 '25 13:04 juruen

It would be nice to have control over the returned fields per tool.

I guess a good start would be to apply field-filtering in the MCP server to a sane default (minimal set of fields).

Yes, this is indeed an issue. We are considering several options:

The server can curate the fields that are returned as there some fields that are not useful at all.

Provide lite functions where fewer fields are returned

Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

Option 3 is the most flexible, but I guess it requires passing the full schema to the model which can also burn tokens quickly. Another option is to let the user select the fields during the mcp setup, and default to minimum fields.

Apr 07 '25 13:04 cufeo

Yes, this is indeed an issue. We are considering several options:

The server can curate the fields that are returned as there some fields that are not useful at all.

Good first step in my view as many links and detailed fields can be dropped.

Provide lite functions where fewer fields are returned

If this causes the number of functions to double, then 1 is better.

Extend the tool with an optional parameter that allows the model to select which fields to return (a-la-graphql)

Would it help if the custom instructions passed to the LLM included a planning step that lists the fields required to perform the query? This set of fields could then extend the default. Otherwise, I wonder if the models would already know how to query github as the github API docs are most likely part of the training data. The LLM could generate GraphQL queries for the user request, which are then verified, extended, and executed by the MCP server.

Apr 09 '25 21:04 bocytko

This is mostly an issue with context window, so I have changed the label - we are actively discussing how we can make this better, so I'll leave this open and we will revisit when we are ready to offer solutions.

Apr 23 '25 13:04 SamMorrowDrums

If it's a blocker for anyone, meanwile you can switch tooling and instruct the llm to use the gh-cli and then filter the response through jq to effectively limit what ends up in the context window.

May 16 '25 04:05 ThomSMG

This issue is stale because it has been open for 60 days with no activity. Leave a comment to avoid closing this issue in 10 days.

Aug 13 '25 08:08 github-actions[bot]

Large output from MCP server often pollutes the context window of LLM, so still an issue and not a stale one

Aug 13 '25 09:08 gauravkumar37

This issue is stale because it has been open for 60 days with no activity. Leave a comment to avoid closing this issue in 120 days.

Oct 14 '25 08:10 github-actions[bot]

Still valid.

Oct 15 '25 13:10 bocytko

Agreed, I hope we can get to this soon!

Oct 15 '25 21:10 SamMorrowDrums

This issue is stale because it has been open for 30 days with no activity. Leave a comment to avoid closing this issue in 60 days.

Nov 15 '25 08:11 github-actions[bot]

Still valid.

Nov 16 '25 14:11 bocytko