llama_index icon indicating copy to clipboard operation
llama_index copied to clipboard

How to clarify the meaning of a schema column in structured data

Open shangdev opened this issue 2 years ago • 3 comments

I am using mysql as a structured data source and sometimes the query does not understand exactly what the column means, resulting in an undesirable response, here is my code.

from pathlib import Path
from llama_index import GPTSQLStructStoreIndex, SQLDatabase
from sqlalchemy import create_engine

engine = create_engine("mysql+mysqlconnector://root:123456@localhost:3306/xxx")
sql_database = SQLDatabase(engine, include_tables=["t_goods"])
index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
)

response = index.query("What is the weight and physical dimensions of the WDM50-2805SL? Please answer in Chinese")
print(response)

The table t_goods has the columns model, weight, size. Mass and weight are the same field, and also commented on the weight column: weight or mass, but he cannot identify or determine that weight is a mass value, only that it is a weight.

shangdev avatar Mar 20 '23 09:03 shangdev

Hey @shangdev, did you try improving the query string with the information you describe? Or improving the prompt template to better give this information to the LLM?

Disiok avatar Mar 25 '23 02:03 Disiok

@Disiok when i add sql_context_container, he return correct result.

# manually set text
table_context_dict={"t_goods": "字段weight表示重量或质量,字段spec表示产品型号,如:xxx"}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict)
context_container = context_builder.build_context_container()

index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
    sql_context_container=context_container
)

result is

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')]
SELECT weight, size FROM t_goods WHERE spec = 'xxx';

What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]

shangdev avatar Mar 25 '23 02:03 shangdev

@Disiok when i add sql_context_container, he return correct result.

# manually set text
table_context_dict={"t_goods": "字段weight表示重量或质量,字段spec表示产品型号,如:xxx"}
context_builder = SQLContextContainerBuilder(sql_database, context_dict=table_context_dict)
context_container = context_builder.build_context_container()

index = GPTSQLStructStoreIndex(
    [], 
    sql_database=sql_database, 
    table_name="t_goods",
    sql_context_container=context_container
)

result is

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 693 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 0 tokens
[('95g±2g', '76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05')]
SELECT weight, size FROM t_goods WHERE spec = 'xxx';

What I want to know is why he returns not a semantic text like "weight is 95g±2g, size is 76.2±0.2,38±0.2,10.5±0.2,6.35±0.2,1.02±0.05", but also a result like [('x','y')]

adding a semantic response and not just the raw SQL response is something that's in the works :)

jerryjliu avatar Mar 25 '23 16:03 jerryjliu