Embedder high token count and columns pruning
Good afternoon, we are facing a problem when we use any embedder. In our case we have very big tables, considering the number of columns. When Wren Ai tries to deploy the schema this is what happens:
Traceback (most recent call last):
File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 3262, in aembedding
response = await init_response # type: ignore
^^^^^^^^^^^^^^^^^^^
File "/app/.venv/lib/python3.12/site-packages/litellm/llms/vertex_ai/vertex_embeddings/embedding_handler.py", line 208, in async_embedding
raise VertexAIError(status_code=error_code, message=err.response.text)
litellm.llms.vertex_ai.vertex_ai_non_gemini.VertexAIError: {
"error": {
"code": 400,
"message": "Unable to submit request because the input token count is 20431 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models",
"status": "INVALID_ARGUMENT"
We managed to fix the problem by changing your config template to:
column_indexing_batch_size: 10
table_retrieval_size: 3
table_column_retrieval_size: 20
enable_column_pruning: true
Is this a reliable solution? At the moment is working but it shows a warn about column pruning (as expected):
db_schemas token count is greater than 100,000, so we will prune columns
Is there a better way to solve this problem? An optimal configuration considering the table has many columns and it can't be split? Is it possible to change chunks in order to avoid column pruning?
Thank you in advance, Jacopo
@corzani
Hi, thanks for reaching out.
For the first part that you tried to change your config to fix the indexing issue, the only line you need to adjust is column_indexing_batch_size. Others are used for retrieval.
For the second part, it highly depends on your capabilities of LLM. if your LLM has large enough token window size and it still can answer questions by ingesting all columns, I suggest you change enable_column_pruning to false and add context_window_size to your model in config.yaml. You could check ref here: https://github.com/Canner/WrenAI/blob/0c0697b0836abb39c3f3b65d2dc0f3474b0a112f/docker/config.example.yaml#L7
Feel free to ask further questions if you still feel confused, thanks
Thanks @cyyeh , I really appreciate your help.
We keep receiving db_schemas token count is greater than 100,000, so we will prune columns
At the end we set up context_window_size to 18k because the error on vertex_ai/text-multilingual-embedding-002 was
Unable to submit request because the input token count is 20431 but the model supports up to 20000. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
So we set up something that is slightly under 20k. We set up 18k to "be sure". I guess by setting 20000 there would be risky.
Questions...
Even if we setup enable_column_pruning, this is what logs say:
I0616 08:50:02.081 8 wren-ai-service:294] db_schemas token count is greater than 100,000, so we will prune columns
Would it be convenient to reset everything from scratch (reset from the config) before changing those configuration? Can this warning be related to the other models that is not the embedded one?
The full config is:
apiVersion: v1
kind: ConfigMap
metadata:
name: wren-config
data:
# Wren Engine Service Port
WREN_ENGINE_PORT: "8080"
# Wren AI Service Port
WREN_AI_SERVICE_PORT: "5555"
WREN_UI_ENDPOINT: http://wren-ui-svc:3000
#Release version used by wren ui https://github.com/Canner/WrenAI/blob/main/docker/docker-compose.yaml#L85-L88
WREN_PRODUCT_VERSION: "0.12.0"
WREN_ENGINE_VERSION: "0.12.3"
WREN_AI_SERVICE_VERSION: "0.12.1"
WREN_UI_VERSION: "0.17.6"
# Document store related
QDRANT_HOST: "wren-qdrant"
# Telemetry
POSTHOG_HOST: "https://app.posthog.com"
TELEMETRY_ENABLED: "false"
# this is for telemetry to know the model, i think ai-service might be able to provide a endpoint to get the information
GENERATION_MODEL: "gpt-4o-mini-2024-07-18"
# service endpoints of AI service & engine service
WREN_ENGINE_ENDPOINT: "http://wren-engine-svc:8080"
WREN_AI_ENDPOINT: "http://wren-ai-service-svc:5555"
#WREN_AI_ENDPOINT: "http://wren-ai-service-svc.ai-system.svc.cluster.local:5555"
# "pg" for postgres as UI application database
WREN_UI_DB_TYPE: pg
#For bootstrap
WREN_ENGINE_DATA_PATH: "/app/data"
### if DB_TYPE = "postgres" you must provide PG_URL string in the *Secret* manifest file (deployment/kustomizations/examples/secret-wren_example.yaml) to connect to postgres
#DEBUG, INFO
LOGGING_LEVEL: INFO
IBIS_SERVER_ENDPOINT: http://wren-ibis-server-svc:8000
---
apiVersion: v1
kind: ConfigMap
metadata:
name: wren-ai-service-config
data:
config.yaml: |
type: llm
provider: litellm_llm
models:
- model: vertex_ai/gemini-2.0-flash-lite
alias: default
timeout: 120
kwargs:
n: 1
temperature: 0
- model: vertex_ai/gemini-2.0-flash-lite
alias: gemini-llm-for-chart
timeout: 120
kwargs:
n: 1
temperature: 0
response_format:
type: json_object
---
type: embedder
provider: litellm_embedder
models:
- model: vertex_ai/text-multilingual-embedding-002
context_window_size: 18000 # this embedder model handles max 20000 token per request
alias: default
timeout: 120
---
type: engine
provider: wren_ui
endpoint: http://wren-ui-svc:3000
---
type: engine
provider: wren_ibis
endpoint: http://wren-ibis-server-svc:8000
---
type: document_store
provider: qdrant
location: http://wren-qdrant:6333
embedding_model_dim: 768 # put your embedding model dimension here
timeout: 120
recreate_index: true
---
type: pipeline
pipes:
- name: db_schema_indexing
embedder: litellm_embedder.default
document_store: qdrant
- name: historical_question_indexing
embedder: litellm_embedder.default
document_store: qdrant
- name: table_description_indexing
embedder: litellm_embedder.default
document_store: qdrant
- name: db_schema_retrieval
llm: litellm_llm.default
embedder: litellm_embedder.default
document_store: qdrant
- name: historical_question_retrieval
embedder: litellm_embedder.default
document_store: qdrant
- name: sql_generation
llm: litellm_llm.default
engine: wren_ui
- name: sql_summary
llm: litellm_llm.default
- name: sql_correction
llm: litellm_llm.default
engine: wren_ui
- name: followup_sql_generation
llm: litellm_llm.default
engine: wren_ui
- name: sql_answer
llm: litellm_llm.default
- name: sql_breakdown
llm: litellm_llm.default
- name: sql_expansion
llm: litellm_llm.default
- name: semantics_description
llm: litellm_llm.default
- name: relationship_recommendation
llm: litellm_llm.default
engine: wren_ui
- name: question_recommendation
llm: litellm_llm.default
- name: question_recommendation_db_schema_retrieval
llm: litellm_llm.default
embedder: litellm_embedder.default
document_store: qdrant
- name: question_recommendation_sql_generation
llm: litellm_llm.default
engine: wren_ui
- name: chart_generation
llm: litellm_llm.gemini-llm-for-chart
- name: chart_adjustment
llm: litellm_llm.gemini-llm-for-chart
- name: intent_classification
llm: litellm_llm.default
embedder: litellm_embedder.default
document_store: qdrant
- name: misleading_assistance
llm: litellm_llm.default
- name: data_assistance
llm: litellm_llm.default
- name: sql_pairs_indexing
document_store: qdrant
embedder: litellm_embedder.default
- name: sql_pairs_retrieval
document_store: qdrant
embedder: litellm_embedder.default
llm: litellm_llm.default
- name: preprocess_sql_data
llm: litellm_llm.default
- name: sql_executor
engine: wren_ui
- name: user_guide_assistance
llm: litellm_llm.default
- name: sql_question_generation
llm: litellm_llm.default
- name: sql_generation_reasoning
llm: litellm_llm.default
- name: followup_sql_generation_reasoning
llm: litellm_llm.default
- name: sql_regeneration
llm: litellm_llm.default
engine: wren_ui
- name: instructions_indexing
embedder: litellm_embedder.default
document_store: qdrant
- name: instructions_retrieval
embedder: litellm_embedder.default
document_store: qdrant
- name: sql_functions_retrieval
engine: wren_ibis
document_store: qdrant
- name: project_meta_indexing
document_store: qdrant
- name: sql_tables_extraction
llm: litellm_llm.default
---
settings:
engine_timeout: 30
column_indexing_batch_size: 10
table_retrieval_size: 10
table_column_retrieval_size: 100
allow_intent_classification: true
allow_sql_generation_reasoning: true
allow_sql_functions_retrieval: true
enable_column_pruning: false
max_sql_correction_retries: 3
query_cache_maxsize: 1000
query_cache_ttl: 3600
langfuse_host: https://cloud.langfuse.com
langfuse_enable: true
logging_level: DEBUG
development: true
historical_question_retrieval_similarity_threshold: 0.9
sql_pairs_similarity_threshold: 0.7
sql_pairs_retrieval_max_size: 10
instructions_similarity_threshold: 0.7
instructions_top_k: 10
Thanks, Jacopo