seq2struct
seq2struct copied to clipboard
`primary_keys` of Spider preprocessing is wrong
Example:
In [6]: train_enc = json.loads(next(open('data/spider-20190205/nl2code-0401,output_from=false,emb=glove-42B,min_freq=50/enc/train.jsonl')))
In [7]: train_enc
Out[7]:
{'column_to_table': {'0': None,
'1': 0,
'10': 1,
'11': 2,
'12': 2,
'13': 2,
'2': 0,
'3': 0,
'4': 0,
'5': 0,
'6': 0,
'7': 1,
'8': 1,
'9': 1},
'columns': [['<type: text>', '*'],
['<type: number>', 'department', 'id'],
['<type: text>', 'name'],
['<type: text>', 'creation'],
['<type: number>', 'ranking'],
['<type: number>', 'budget', 'in', 'billions'],
['<type: number>', 'num', 'employees'],
['<type: number>', 'head', 'id'],
['<type: text>', 'name'],
['<type: text>', 'born', 'state'],
['<type: number>', 'age'],
['<type: number>', 'department', 'id'],
['<type: number>', 'head', 'id'],
['<type: text>', 'temporary', 'acting']],
'db_id': 'department_management',
'foreign_keys': {'11': 1, '12': 7},
'foreign_keys_tables': {'2': [0, 1]},
'primary_keys': [11, 11, 11],
'question': ['how',
'many',
'heads',
'of',
'the',
'departments',
'are',
'older',
'than',
'56',
'?'],
'table_bounds': [1, 7, 11, 14],
'table_to_columns': {'0': [1, 2, 3, 4, 5, 6],
'1': [7, 8, 9, 10],
'2': [11, 12, 13]},
'tables': [['department'], ['head'], ['management']]}
'primary_keys': [11, 11, 11],
is obviously incorrect.