test-suite-sql-eval
test-suite-sql-eval copied to clipboard
Bug with evaluating SQLs containing "value" identifiers
I download spider dataset from Spider Leaderboard.
When I run the command:
python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql
Then returns:
easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY =====================
execution 0.996 0.977 0.983 0.994 0.985
However, in above command, the gold file and pred file are the same.
I find the problem in evaluation.py
, code link (line# 569):
https://github.com/taoyds/test-suite-sql-eval/blob/e97acc546ecbee8fa27fa8dbf025ef61493a876c/evaluation.py#L569C13-L569C48
When predicted sql contains "value" identifiers (e.g. a column called "value"), p_str = p_str.replace("value", "1")
replace string "value" in predicted sql incorrectly.
There are such sqls in Spider-test dataset:
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1 SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1 SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1 SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1 SELECT avg(value) , sum(value) FROM boxes warehouse_1 SELECT avg(value) , sum(value) FROM boxes warehouse_1 SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1 SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1 SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1 SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1 SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1 SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1 SELECT avg(value) FROM boxes warehouse_1 SELECT avg(value) FROM boxes warehouse_1 ......
When I delete p_str = p_str.replace("value", "1")
, the evaluation returns 100% execution score as expected.
@taoyds @rizar @ruiqi-zhong