WrenAI icon indicating copy to clipboard operation
WrenAI copied to clipboard

Embedder high token count and columns pruning

Open corzani opened this issue 7 months ago • 2 comments

Good afternoon, we are facing a problem when we use any embedder. In our case we have very big tables, considering the number of columns. When Wren Ai tries to deploy the schema this is what happens:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3262, in aembedding
    response = await init_response  # type: ignore
               ^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/llms/vertex_ai/vertex_embeddings/embedding_handler.py", line 208, in async_embedding
    raise VertexAIError(status_code=error_code, message=err.response.text)
litellm.llms.vertex_ai.vertex_ai_non_gemini.VertexAIError: {
  "error": {
    "code": 400,
    "message": "Unable to submit request because the input token count is 20431 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models",
    "status": "INVALID_ARGUMENT"

We managed to fix the problem by changing your config template to:

column_indexing_batch_size: 10
table_retrieval_size: 3 
table_column_retrieval_size: 20
enable_column_pruning: true

Is this a reliable solution? At the moment is working but it shows a warn about column pruning (as expected):

db_schemas token count is greater than 100,000, so we will prune columns

Is there a better way to solve this problem? An optimal configuration considering the table has many columns and it can't be split? Is it possible to change chunks in order to avoid column pruning?

Thank you in advance, Jacopo

corzani avatar Jun 13 '25 13:06 corzani

@corzani

Hi, thanks for reaching out. For the first part that you tried to change your config to fix the indexing issue, the only line you need to adjust is column_indexing_batch_size. Others are used for retrieval. For the second part, it highly depends on your capabilities of LLM. if your LLM has large enough token window size and it still can answer questions by ingesting all columns, I suggest you change enable_column_pruning to false and add context_window_size to your model in config.yaml. You could check ref here: https://github.com/Canner/WrenAI/blob/0c0697b0836abb39c3f3b65d2dc0f3474b0a112f/docker/config.example.yaml#L7

Feel free to ask further questions if you still feel confused, thanks

cyyeh avatar Jun 13 '25 22:06 cyyeh

Thanks @cyyeh , I really appreciate your help. We keep receiving db_schemas token count is greater than 100,000, so we will prune columns

At the end we set up context_window_size to 18k because the error on vertex_ai/text-multilingual-embedding-002 was

Unable to submit request because the input token count is 20431 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models

So we set up something that is slightly under 20k. We set up 18k to "be sure". I guess by setting 20000 there would be risky.

Questions...

Even if we setup enable_column_pruning, this is what logs say:

I0616 08:50:02.081 8 wren-ai-service:294] db_schemas token count is greater than 100,000, so we will prune columns

Would it be convenient to reset everything from scratch (reset from the config) before changing those configuration? Can this warning be related to the other models that is not the embedded one?

The full config is:

apiVersion: v1
kind: ConfigMap
metadata:
  name: wren-config
data:
  # Wren Engine Service Port
  WREN_ENGINE_PORT: "8080"
  # Wren AI Service Port
  WREN_AI_SERVICE_PORT: "5555"

  WREN_UI_ENDPOINT: http://wren-ui-svc:3000

  #Release version used by wren ui https://github.com/Canner/WrenAI/blob/main/docker/docker-compose.yaml#L85-L88
  WREN_PRODUCT_VERSION: "0.12.0"
  WREN_ENGINE_VERSION: "0.12.3"
  WREN_AI_SERVICE_VERSION: "0.12.1"
  WREN_UI_VERSION: "0.17.6"

  # Document store related
  QDRANT_HOST: "wren-qdrant"

  # Telemetry
  POSTHOG_HOST: "https://app.posthog.com"
  TELEMETRY_ENABLED: "false"
  # this is for telemetry to know the model, i think ai-service might be able to provide a endpoint to get the information
  GENERATION_MODEL: "gpt-4o-mini-2024-07-18"

  # service endpoints of AI service & engine service
  WREN_ENGINE_ENDPOINT: "http://wren-engine-svc:8080"
  WREN_AI_ENDPOINT: "http://wren-ai-service-svc:5555"
  #WREN_AI_ENDPOINT: "http://wren-ai-service-svc.ai-system.svc.cluster.local:5555"

  # "pg" for postgres as UI application database
  WREN_UI_DB_TYPE: pg

  #For bootstrap
  WREN_ENGINE_DATA_PATH: "/app/data"

  ### if DB_TYPE = "postgres" you must provide PG_URL string in the *Secret* manifest file (deployment/kustomizations/examples/secret-wren_example.yaml) to connect to postgres

  #DEBUG, INFO
  LOGGING_LEVEL: INFO

  IBIS_SERVER_ENDPOINT: http://wren-ibis-server-svc:8000
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: wren-ai-service-config
data:
  config.yaml: |
    type: llm
    provider: litellm_llm
    models:
      - model: vertex_ai/gemini-2.0-flash-lite
        alias: default
        timeout: 120
        kwargs:
          n: 1
          temperature: 0
      - model: vertex_ai/gemini-2.0-flash-lite
        alias: gemini-llm-for-chart
        timeout: 120
        kwargs:
          n: 1
          temperature: 0
          response_format:
            type: json_object

    ---
    type: embedder
    provider: litellm_embedder
    models:
      - model: vertex_ai/text-multilingual-embedding-002
        context_window_size: 18000 # this embedder model handles max 20000 token per request
        alias: default
        timeout: 120

    ---
    type: engine
    provider: wren_ui
    endpoint: http://wren-ui-svc:3000

    ---
    type: engine
    provider: wren_ibis
    endpoint: http://wren-ibis-server-svc:8000

    ---
    type: document_store
    provider: qdrant
    location: http://wren-qdrant:6333
    embedding_model_dim: 768 # put your embedding model dimension here
    timeout: 120
    recreate_index: true

    ---
    type: pipeline
    pipes:
      - name: db_schema_indexing
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: historical_question_indexing
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: table_description_indexing
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: db_schema_retrieval
        llm: litellm_llm.default
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: historical_question_retrieval
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: sql_generation
        llm: litellm_llm.default
        engine: wren_ui
      - name: sql_summary
        llm: litellm_llm.default
      - name: sql_correction
        llm: litellm_llm.default
        engine: wren_ui
      - name: followup_sql_generation
        llm: litellm_llm.default
        engine: wren_ui
      - name: sql_answer
        llm: litellm_llm.default
      - name: sql_breakdown
        llm: litellm_llm.default
      - name: sql_expansion
        llm: litellm_llm.default
      - name: semantics_description
        llm: litellm_llm.default
      - name: relationship_recommendation
        llm: litellm_llm.default
        engine: wren_ui
      - name: question_recommendation
        llm: litellm_llm.default
      - name: question_recommendation_db_schema_retrieval
        llm: litellm_llm.default
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: question_recommendation_sql_generation
        llm: litellm_llm.default
        engine: wren_ui
      - name: chart_generation
        llm: litellm_llm.gemini-llm-for-chart
      - name: chart_adjustment
        llm: litellm_llm.gemini-llm-for-chart
      - name: intent_classification
        llm: litellm_llm.default
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: misleading_assistance
        llm: litellm_llm.default
      - name: data_assistance
        llm: litellm_llm.default
      - name: sql_pairs_indexing
        document_store: qdrant
        embedder: litellm_embedder.default
      - name: sql_pairs_retrieval
        document_store: qdrant
        embedder: litellm_embedder.default
        llm: litellm_llm.default
      - name: preprocess_sql_data
        llm: litellm_llm.default
      - name: sql_executor
        engine: wren_ui
      - name: user_guide_assistance
        llm: litellm_llm.default
      - name: sql_question_generation
        llm: litellm_llm.default
      - name: sql_generation_reasoning
        llm: litellm_llm.default
      - name: followup_sql_generation_reasoning
        llm: litellm_llm.default
      - name: sql_regeneration
        llm: litellm_llm.default
        engine: wren_ui
      - name: instructions_indexing
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: instructions_retrieval
        embedder: litellm_embedder.default
        document_store: qdrant
      - name: sql_functions_retrieval
        engine: wren_ibis
        document_store: qdrant
      - name: project_meta_indexing
        document_store: qdrant
      - name: sql_tables_extraction
        llm: litellm_llm.default

    ---
    settings:
      engine_timeout: 30
      column_indexing_batch_size: 10
      table_retrieval_size: 10
      table_column_retrieval_size: 100
      allow_intent_classification: true
      allow_sql_generation_reasoning: true
      allow_sql_functions_retrieval: true
      enable_column_pruning: false
      max_sql_correction_retries: 3
      query_cache_maxsize: 1000
      query_cache_ttl: 3600
      langfuse_host: https://cloud.langfuse.com
      langfuse_enable: true
      logging_level: DEBUG
      development: true
      historical_question_retrieval_similarity_threshold: 0.9
      sql_pairs_similarity_threshold: 0.7
      sql_pairs_retrieval_max_size: 10
      instructions_similarity_threshold: 0.7
      instructions_top_k: 10

Thanks, Jacopo

corzani avatar Jun 16 '25 09:06 corzani