vanna icon indicating copy to clipboard operation
vanna copied to clipboard

[Query question] How the vanna ai get the correct value in database table?

Open relic-yuexi opened this issue 2 years ago β€’ 2 comments

First of all, thank you for your efforts. I have a question, how do you ensure that vanna ai gets the correct data, assuming there are ten tables, each table has ten fields, and each table has more than 10,000 data. If I don’t know the table name and field name, just ask a question related to the value of a data item inside, how can AI answer it?

relic-yuexi avatar Jan 18 '24 01:01 relic-yuexi

I have a problem similar to yours. And I found that none of the text2sql models/studies explained how to get and map the question content to the specific value of the data in the database table. Take training_data in this project as an example:

{ "question":"What are the 10 countries with highest government debt in 2008 ? ", "answer":"SELECT geo_name,\r\n value\r\nFROM data_commons_public_data.cybersyn.timeseries join data_commons_public_data.cybersyn.geo_index\r\n ON timeseries.geo_id = geo_index.id\r\nWHERE variable_name = 'Amount of Debt: Government'\r\n and date = '2008-01-01'\r\n and geo_id like 'country\/%'\r\nORDER BY value desc limit 10;" } How can we know the variable_name field has a value Amount of Debt: Government but not government debt?

chenryn avatar Jan 22 '24 09:01 chenryn

I have a problem similar to yours. And I found that none of the text2sql models/studies explained how to get and map the question content to the specific value of the data in the database table. Take training_data in this project as an example:

{ "question":"What are the 10 countries with highest government debt in 2008 ? ", "answer":"SELECT geo_name,\r\n value\r\nFROM data_commons_public_data.cybersyn.timeseries join data_commons_public_data.cybersyn.geo_index\r\n ON timeseries.geo_id = geo_index.id\r\nWHERE variable_name = 'Amount of Debt: Government'\r\n and date = '2008-01-01'\r\n and geo_id like 'country\/%'\r\nORDER BY value desc limit 10;" } How can we know the variable_name field has a value Amount of Debt: Government but not government debt?

what this project has done maybe is just add domentation to rectify the answer.

But from my experience , we can just add distinct value of field to the documentation.

relic-yuexi avatar Jan 23 '24 01:01 relic-yuexi