language
language copied to clipboard
language/xsp/data_preprocessing/abstract_sql_converters.py foreign key direction wrong
The Foreign key is defined by child table, parent table, child column, parent column.
I see a line here:
ForeignKeyRelation('flight', 'flight_stop', 'flight_id', 'flight_id')
However, it seems that flight_stop.flight_id should be a child of flight.flight_id . (i.e. the set of elements from flight_stop.flight_id is a subset of flight.flight_id)
It does not seem that all relations are reversed either; for example:
ForeignKeyRelation('city', 'state', 'state_code', 'state_code')
The above relation seems to be correct.
Did you use some automatic script to mine the key relation from the corpus? Here are some cases where the automatic extraction rules might fail:
ForeignKeyRelation('course', 'course', 'course_id', 'course_id')
In this case, a column is its own parent, which seems a little wrong to me.
ForeignKeyRelation('program_course', 'student_record', course_id', 'course_id')
This is plausible, but I suspect that program_course.course_id might be a direct child of course.course_id . This problem is especially salient with the geography database.
There seems to be a lot of disagreeing foreign key directions in this file, and I wonder whether it would hurt OOD performance. Thanks!
The reversal of each foreign key relation in the file you referenced is computed and used in this function: https://github.com/google-research/language/blob/ea6a706f61c08e45a5750776062ba667ac6240c0/language/xsp/data_preprocessing/abstract_sql.py#L554 so in practice both foreign key directions are considered when trying to find a path between two columns.