LightRAG icon indicating copy to clipboard operation
LightRAG copied to clipboard

Refact: Add Embedding Token Limit Configuration and Improve Error Handling

Open danielaskdd opened this issue 2 weeks ago • 4 comments

Refact: Add Embedding Token Limit Configuration and Improve Error Handling

Summary

This PR enhances the embedding functionality with configurable token limits, improved error handling, and better reliability. It consists of three related improvements that work together to provide a more robust embedding experience.

Motivation

The current implementation had several limitations:

  1. Token Overflow Risk: LLM-generated summaries could exceed embedding model token limits (e.g., bge-m3's 8192 token limit), causing silent failures
  2. Inconsistent Error Handling: Bedrock embedding errors lacked proper retry mechanisms and specific exception types
  3. Missing Token Constraints: No way to configure or validate token limits for different embedding models

Changes

Add max_token_size parameter to embedding function decorators

  • Added max_token_size=8192 parameter to all embedding function decorators
  • Moved siliconcloud implementation to deprecated folder
  • Imported wrap_embedding_func_with_attrs for consistent decorator usage
  • Updated EmbeddingFunc docstring with parameter documentation
  • Fixed langfuse import type annotation

Improve Bedrock error handling with retry logic and custom exceptions

  • Added specific exception types for better error classification

  • Implemented proper retry mechanism with exponential backoff

  • Enhanced error logging and validation

  • Enabled embedding retry decorator for resilience

  • Distinguishes between retryable and non-retryable errors

  • Provides detailed error context in logs

  • Automatically retries transient failures

  • Better debugging capabilities

Commit 3: Add configurable embedding token limit with validation

  • Added EMBEDDING_TOKEN_LIMIT environment variable support
  • Automatically sets max_token_size on embedding function initialization
  • Added embedding_token_limit property to LightRAG class
  • Implemented summary length validation against token limit
  • Logs warning when summary exceeds 90% of token limit

Technical Implementation:

  1. Environment Configuration (lightrag/api/config.py)

    • Read EMBEDDING_TOKEN_LIMIT as integer (optional)
  2. Server Initialization (lightrag/api/lightrag_server.py)

    • Pass max_token_size to embedding function if configured
  3. LightRAG Property (lightrag/lightrag.py)

    • Safe accessor for embedding token limit from function
  4. Validation (lightrag/operate.py)

    • Check summary tokens in _summarize_descriptions
    • Log detailed warning when threshold (90%) exceeded

Usage Example

# Set embedding token limit in .env file
EMBEDDING_TOKEN_LIMIT=8192

# Or as environment variable
export EMBEDDING_TOKEN_LIMIT=8192

Technical Details

  • Backward Compatibility: All changes maintain full backward compatibility
  • Safety Margin: 90% threshold provides buffer before hard limit
  • Graceful Degradation: System continues to function when limit not configured
  • Comprehensive Logging: Detailed warnings help identify potential issues early

Testing

  • [x] Tested with various EMBEDDING_TOKEN_LIMIT values
  • [x] Verified warning logs appear at 90% threshold
  • [x] Confirmed Bedrock retry mechanism works correctly
  • [x] Validated backward compatibility without configuration
  • [x] Tested with multiple embedding model providers

Breaking Changes

None - All changes are fully backward compatible.

danielaskdd avatar Nov 14 '25 11:11 danielaskdd

@codex review

danielaskdd avatar Nov 14 '25 11:11 danielaskdd

@codex review

danielaskdd avatar Nov 14 '25 12:11 danielaskdd

@codex review

danielaskdd avatar Nov 14 '25 14:11 danielaskdd

@codex review

danielaskdd avatar Nov 14 '25 14:11 danielaskdd

@codex review

danielaskdd avatar Nov 14 '25 15:11 danielaskdd

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".