spider Annotation Issues [Please report any annotation errors here, thanks!]

Hi,

Even though our group spent a lot of time and effort on creating the Spider dataset, there definitely exist some annotation errors. We would appreciate your input if you report your findings here. Our group will try our best to correct the errors in our next release.

Thanks for your interest!

Best, Tao

Feb 13 '19 16:02 taoyds

SQL annotation errors: issue 23,
Schema errors: issue 17, issue 20

Feb 13 '19 16:02 taoyds

Hi, thanks again for providing this valuable resource.

Wanted to report some errors in the dev set:

NLQ: What is the name and capacity of the stadium with the most concerts after 2013?. The resulting SQL query has >= 2014, but while technically correct, don't think it makes as much semantic sense as > 2013.
NLQ: What is the smallest weight of the car produced with 8 cylinders on 1974? The SQL query incorrectly references Cylinders = 4.
NLQ: What is the minimu weight of the car with 8 cylinders produced in 1974? Same as the previous one, and minimum is spelled incorrectly.
NLQ: Among the cars that do not have the minimum horsepower, what are the make ids and names of al those with less than 4 cylinders? Similar to 1, it should be < 4 in the SQL, not <= 3.
NLQ: List the name of teachers whose hometown is not "Little Lever Urban District". The SQL query condition is missing a t: little lever urban distric.
NLQ: What are the names of the teachers whose hometown is not "Little Lever Urban District"? same as above.
NLQ: What is the mobile phone number of the student named Timothy Ward? The actual database has the value Timmothy, as does the SQL query. (This is debatably incorrect.)
NLQ: What is the total population and average area of countries in the continent of North America whose area is bigger than 3000？ The question mark symbol uses a strange character for this one.
NLQ: Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than 3000. North America is spelled incorrectly.
NLQ: Return the names of cities that have a population between 160000 and 900000. The query incorrectly uses 90000 (missing a 0).

Mar 15 '19 22:03 chrisjbaik

@chrisjbaik Thanks a lot!

Mar 20 '19 14:03 taoyds

There is a bug in the table of players in wta_1database. The wta_1.sql file contains wrong contents on the first line which is : "CRloser_rank_pointsEATE TABLE players(" If you run: "select * from players" or any other related to table players, you will get an error.

Apr 19 '19 16:04 ygan

Do you plan to incorporate the annotation fixes in future data release? If so that would be great.

The dataset .zip file released on the website still contain the original errors.

Oct 29 '19 21:10 todpole3

Yes! We are planning to fix annotation errors and release Spider 2.0 by January 2020. Also, we will update the evaluation script. Thanks!

Nov 16 '19 14:11 taoyds

NLQ: "How many acting statuses are there?" SQL: "SELECT count(DISTINCT temporary_acting) FROM management"

I feel like there is some ambiguity here, it could be distinct or it could just be the total number

Nov 21 '19 14:11 andrewbury

Hi!

I would like to inform you small inconsistency db_id = formula_1 table in tables.json,

The order of tables presented in "table_names" & "table_names_original" fields are different making in difficult to automatically match between them.

Thanks!

Dec 03 '19 02:12 whwang299

Hi

In dev dataset, I found that QuestionA: What is the average and the maximum capacity of all stadiums ? QuestionB: What is the average and maximum capacities for all stations ? PS: The stadiums table has a column named "AVERAGE".

They are all annotated to the same SQL: SELECT avg(capacity) , max(capacity) FROM stadium. I suppose that QuestionB is right, but the QuetionA should be annotated as select Average , max ( Capacity ) from stadium

Thanks!

Dec 12 '19 10:12 longxudou

adding issue #10 for the train_gold.sql file which is still unfixed

for the question What is the description of the type of the company who concluded its contracts most recently? the correct SQL should be SELECT T1.company_type FROM Third_Party_Companies AS T1 JOIN Maintenance_Contracts AS T2 ON T1.company_id = T2.maintenance_contract_company_id ORDER BY T2.contract_end_date DESC LIMIT 1

Mar 05 '20 18:03 mnoukhov

Sorry for the delay! We finally corrected some annotation errors and label mismatches (not errors) in Spider dev and test sets (~4% of dev examples updated, click here for more details). Thanks for your inputs!

Jun 08 '20 08:06 taoyds

Schema error: issue #53

Jun 08 '20 19:06 CrafterKolyan

@taoyds ygan already told about this problem in Apr 19, 2019. Please fix it.

Jun 08 '20 19:06 CrafterKolyan

As I didn't want to wait one more year for fixes to come I've already done everything by myself. Spider Dataset from 06/07/2020 with fixed wta_1: https://drive.google.com/file/d/1m68AHHPC4pqyjT-Zmt-u8TRqdw5vp-U5/view Fixed evaluation.py which is stable to decoding issues in wta_1 database and gives 1.0 accuracy, recall and F1 when dev_gold.sql are supposed to be gold and predicted labels simultaneously: https://github.com/CrafterKolyan/spider-fixed/blob/master/evaluation.py

Jun 14 '20 15:06 CrafterKolyan

See #56

Jul 02 '20 11:07 aosokin

See #57

Jul 02 '20 11:07 aosokin

See #58

Jul 02 '20 12:07 aosokin

In the train set, the NL question: What are the first names of the faculty members playing both Canoeing and Kayaking?

The gold SQL is incorrect:

SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T2.actid WHERE T3.activity_name = 'Canoeing' INTERSECT SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T2.actid WHERE T3.activity_name = 'Kayaking'

Should be:

SELECT T1.fname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T3.actid WHERE T3.activity_name = 'Canoeing' INTERSECT SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T3.actid WHERE T3.activity_name = 'Kayaking'

Aug 17 '20 08:08 tomerwolgithub

a few unexpected table names found in tables.json. e.g. sqlite_sequence table in world_1 database in tables.json.

credits to Pedro

Apr 26 '21 07:04 taoyds

See #70

Sep 08 '21 12:09 ReinierKoops

As I didn't want to wait one more year for fixes to come I've already done everything by myself. Spider Dataset from 06/07/2020 with fixed wta_1: https://drive.google.com/file/d/1m68AHHPC4pqyjT-Zmt-u8TRqdw5vp-U5/view Fixed evaluation.py which is stable to decoding issues in wta_1 database and gives 1.0 accuracy, recall and F1 when dev_gold.sql are supposed to be gold and predicted labels simultaneously: https://github.com/CrafterKolyan/spider-fixed/blob/master/evaluation.py

#75 Can you please look into it?

Sep 09 '21 07:09 alan-ai-learner

See #70

#75 Can you please look into it?

Sep 09 '21 07:09 alan-ai-learner

Tables.json > "useracct" should be "user account" for table_names (not for table_names_original)

Nov 23 '21 13:11 ReinierKoops

spider spider copied to clipboard

Annotation Issues [Please report any annotation errors here, thanks!]

spider
spider copied to clipboard