llama_index How to clarify the meaning of a schema column in structured data

I am using mysql as a structured data source and sometimes the query does not understand exactly what the column means, resulting in an undesirable response, here is my code.

from pathlib import Path
from llama_index import GPTSQLStructStoreIndex, SQLDatabase
from sqlalchemy import create_engine

engine = create_engine("mysql+mysqlconnector://root:123456@localhost:3306/xxx")
sql_database = SQLDatabase(engine, include_tables=["t_goods"])
index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
)

response = index.query("What is the weight and physical dimensions of the WDM50-2805SL? Please answer in Chinese")
print(response)

The table t_goods has the columns model, weight, size. Mass and weight are the same field, and also commented on the weight column: weight or mass, but he cannot identify or determine that weight is a mass value, only that it is a weight.

Mar 20 '23 09:03 shangdev

Hey @shangdev, did you try improving the query string with the information you describe? Or improving the prompt template to better give this information to the LLM?

Mar 25 '23 02:03 Disiok

@Disiok when i add sql_context_container, he return correct result.

# manually set text
table_context_dict={"t_goods": "字段weight表示重量或质量，字段spec表示产品型号，如：xxx"}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict)
context_container = context_builder.build_context_container()

index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
    sql_context_container=context_container
)

result is

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')]
SELECT weight, size FROM t_goods WHERE spec = 'xxx';

What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]

Mar 25 '23 02:03 shangdev

@Disiok when i add sql_context_container, he return correct result.

# manually set text
table_context_dict={"t_goods": "字段weight表示重量或质量，字段spec表示产品型号，如：xxx"}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict)
context_container = context_builder.build_context_container()

index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
    sql_context_container=context_container
)

result is

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')]
SELECT weight, size FROM t_goods WHERE spec = 'xxx';

What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]

adding a semantic response and not just the raw SQL response is something that's in the works :)

Mar 25 '23 16:03 jerryjliu