graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

MCP Server: DEFAULT_MAX_TOKENS 8192 is too restrictive and frequently causes errors

Open zgqq opened this issue 5 months ago • 8 comments

Problem Description

The MCP server currently has a hardcoded DEFAULT_MAX_TOKENS = 8192 limit in graphiti_core/llm_client/config.py, which is too restrictive for many real-world use cases and frequently causes errors when processing longer content or complex graph operations.

Current Implementation

In graphiti_core/llm_client/config.py:

DEFAULT_MAX_TOKENS = 8192
DEFAULT_TEMPERATURE = 0

class LLMConfig:
    def __init__(
        self,
        api_key: str | None = None,
        model: str | None = None,
        base_url: str | None = None,
        temperature: float = DEFAULT_TEMPERATURE,
        max_tokens: int = DEFAULT_MAX_TOKENS,  # <-- This defaults to 8192
        small_model: str | None = None,
    ):

Issues with Current Limit

  1. Too Restrictive: 8192 tokens is often insufficient for:

    • Processing longer documents or episodes
    • Complex entity extraction operations
    • Detailed fact generation from rich content
    • Summary generation for large knowledge graphs
  2. Frequent Errors: Users encounter token limit exceeded errors when:

    • Adding episodes with substantial content
    • Processing JSON data with nested structures
    • Performing entity resolution on complex datasets
  3. No Configuration Flexibility: The current implementation doesn't allow users to easily configure this limit without modifying the source code.

Proposed Solutions

Option 1: Increase Default Value

Increase DEFAULT_MAX_TOKENS to a more reasonable default like 16384 or 32768 to accommodate modern LLM capabilities and typical use cases.

Option 2: Make it Configurable via Environment Variable

DEFAULT_MAX_TOKENS = int(os.getenv('GRAPHITI_MAX_TOKENS', '16384'))

Option 3: Add MCP Server CLI Argument

Add a --max-tokens argument to the MCP server configuration:

parser.add_argument(
    '--max-tokens',
    type=int,
    default=16384,
    help='Maximum tokens for LLM requests (default: 16384)'
)

Option 4: Model-Specific Defaults

Set different defaults based on the model being used, as different models have different context windows:

  • GPT-4: 32768 tokens
  • GPT-4-turbo: 128000 tokens
  • Claude-3: 200000 tokens

Environment

  • Graphiti Version: Latest (as of July 2025)
  • Component: MCP Server
  • File: graphiti_core/llm_client/config.py

Impact

This change would:

  • ✅ Reduce user frustration from frequent token limit errors
  • ✅ Support more complex use cases out of the box
  • ✅ Better utilize modern LLM capabilities
  • ✅ Maintain backward compatibility if implemented as configurable

Additional Context

Similar issues have been reported in the past regarding rate limits and token handling (see issues #290, #508, #544), indicating that token management is a common pain point for users.

The current 8192 limit appears to be a conservative default that doesn't reflect the capabilities of modern LLMs or typical user requirements when working with knowledge graphs.

zgqq avatar Jul 15 '25 03:07 zgqq

Hello moderators, Can I pick this issue?

hemant-mistry avatar Jul 15 '25 08:07 hemant-mistry

I'm fine with this being increased to 16384. Note that individual LLMProviders implement their own max_tokens.

danielchalef avatar Jul 15 '25 20:07 danielchalef

I'm fine with this being increased to 16384. Note that individual LLMProviders implement their own max_tokens.

Though i increased max_tokens to 16384, even 32768, but reported the error "Retrying after application error (attempt 1/2): Output length exceeded max tokens 16384/32768: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16384, prompt_tokens=1812, total_tompletion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=None, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=None), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))",the same is that completion_tokens is the max_tokens(16384 or 32768), but add the prompt_tokens which caused the sum tokens exceeded the max_tokens.

soberyyz avatar Jul 31 '25 09:07 soberyyz

I am also getting similar error for max_tokens.

look4pritam avatar Aug 14 '25 13:08 look4pritam

@zgqq Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 06 '25 00:10 claude[bot]

@zgqq Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 22 '25 00:10 claude[bot]

@zgqq Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 29 '25 00:10 claude[bot]

@zgqq Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 17 '25 00:11 claude[bot]