test-suite-sql-eval
test-suite-sql-eval copied to clipboard
Inconsistent SQL Query Results on Different Systems
I encountered an issue while using your project. The problem arises because the pred results differ between Windows and Linux systems.
Description
Gold SQL:
SELECT YEAR FROM concert GROUP BY YEAR ORDER BY count(*) DESC LIMIT 1;
Pred SQL:
SELECT MAX(Year) FROM concert;
Schema and Data The table schema and sample data used are as follows:
CREATE TABLE "concert" (
"concert_ID" int,
"concert_Name" text,
"Theme" text,
"Stadium_ID" text,
"Year" text,
PRIMARY KEY ("concert_ID"),
FOREIGN KEY ("Stadium_ID") REFERENCES "stadium"("Stadium_ID")
);
INSERT INTO "concert" VALUES (1, "Auditions", "Free choice", 1, 2014);
INSERT INTO "concert" VALUES (2, "Super bootcamp", "Free choice 2", 2, 2014);
INSERT INTO "concert" VALUES (3, "Home Visits", "Bleeding Love", 2, 2015);
INSERT INTO "concert" VALUES (4, "Week 1", "Wide Awake", 10, 2014);
INSERT INTO "concert" VALUES (5, "Week 1", "Happy Tonight", 9, 2015);
INSERT INTO "concert" VALUES (6, "Week 2", "Party All Night", 7, 2015);
Observations
When executing the queries on different systems, the results for pred were inconsistent:
On Windows 11:
Gold: [('2014',)] Pred: [('2015',)]
On Linux:
Gold: [('2015',)] Pred: [('2015',)]
Analysis
Both 2014 and 2015 have the same count of concerts (3 each). Therefore, either year could be returned as the correct result. However, the discrepancy between systems suggests an underlying difference in implementation that affects the pred query results.
The main issue is that pred is not consistent across different systems, leading to potential false negatives when comparing it to gold. This example highlights a potential flaw in the evaluation method used by the project.
Conclusion
This issue demonstrates that the same SQL query can yield different results on different systems, affecting the reliability of the evaluation process. Further investigation into the underlying cause of this discrepancy and how to handle such cases in the evaluation would be valuable.