How to clarify the meaning of a schema column in structured data
I am using mysql as a structured data source and sometimes the query does not understand exactly what the column means, resulting in an undesirable response, here is my code.
from pathlib import Path
from llama_index import GPTSQLStructStoreIndex, SQLDatabase
from sqlalchemy import create_engine
engine = create_engine("mysql+mysqlconnector://root:123456@localhost:3306/xxx")
sql_database = SQLDatabase(engine, include_tables=["t_goods"])
index = GPTSQLStructStoreIndex(
[],
sql_database=sql_database,
table_name="t_goods",
)
response = index.query("What is the weight and physical dimensions of the WDM50-2805SL? Please answer in Chinese")
print(response)
The table t_goods has the columns model, weight, size. Mass and weight are the same field, and also commented on the weight column: weight or mass, but he cannot identify or determine that weight is a mass value, only that it is a weight.
Hey @shangdev, did you try improving the query string with the information you describe? Or improving the prompt template to better give this information to the LLM?
@Disiok when i add sql_context_container, he return correct result.
# manually set text
table_context_dict={"t_goods": "字段weight表示重量或质量,字段spec表示产品型号,如:xxx"}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict)
context_container = context_builder.build_context_container()
index = GPTSQLStructStoreIndex(
[],
sql_database=sql_database,
table_name="t_goods",
sql_context_container=context_container
)
result is
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')]
SELECT weight, size FROM t_goods WHERE spec = 'xxx';
What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]
@Disiok when i add sql_context_container, he return correct result.
# manually set text table_context_dict={"t_goods": "字段weight表示重量或质量,字段spec表示产品型号,如:xxx"} context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict) context_container = context_builder.build_context_container() index = GPTSQLStructStoreIndex( [], sql_database=sql_database, table_name="t_goods", sql_context_container=context_container )result is
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens [('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')] SELECT weight, size FROM t_goods WHERE spec = 'xxx';What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]
adding a semantic response and not just the raw SQL response is something that's in the works :)