test-suite-sql-eval icon indicating copy to clipboard operation
test-suite-sql-eval copied to clipboard

Bug with evaluating SQLs containing "value" identifiers

Open BugMaker-Boyan opened this issue 1 year ago • 1 comments

I download spider dataset from Spider Leaderboard.

When I run the command: python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql

Then returns: easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY ===================== execution 0.996 0.977 0.983 0.994 0.985

However, in above command, the gold file and pred file are the same.

I find the problem in evaluation.py, code link (line# 569): https://github.com/taoyds/test-suite-sql-eval/blob/e97acc546ecbee8fa27fa8dbf025ef61493a876c/evaluation.py#L569C13-L569C48

When predicted sql contains "value" identifiers (e.g. a column called "value"), p_str = p_str.replace("value", "1") replace string "value" in predicted sql incorrectly.

There are such sqls in Spider-test dataset:

SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1 SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1 SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1 SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1 SELECT avg(value) , sum(value) FROM boxes warehouse_1 SELECT avg(value) , sum(value) FROM boxes warehouse_1 SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1 SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1 SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1 SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1 SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1 SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1 SELECT avg(value) FROM boxes warehouse_1 SELECT avg(value) FROM boxes warehouse_1 ......

When I delete p_str = p_str.replace("value", "1"), the evaluation returns 100% execution score as expected.

BugMaker-Boyan avatar Jan 15 '24 13:01 BugMaker-Boyan

@taoyds @rizar @ruiqi-zhong

BugMaker-Boyan avatar Jan 25 '24 05:01 BugMaker-Boyan