Missing relations
I found 266 examples (context-windows) which have tokens with root_ids marked as "0" and tag_id, say TXXX, but there are no tokens with root_id TXXX in example in train and dev set.
For example there is such T105 tokens:
data/source_txt/t3_physics_2_101.deft TOKEN ROOT_ID TAG_ID RELATION 3161 -1 -1 0 . -1 -1 0 Another -1 -1 0 is -1 -1 0 what -1 -1 0 Democritus -1 -1 0 in -1 -1 0 particular -1 -1 0 believed -1 -1 0 — -1 -1 0 that -1 -1 0 there 0 T106 0 is 0 T106 0 a 0 T106 0 smallest 0 T106 0 unit 0 T106 0 that 0 T106 0 can 0 T106 0 not 0 T106 0 be 0 T106 0 further 0 T106 0 subdivided 0 T106 0 . -1 -1 0 Democritus -1 -1 0 called -1 -1 0 this T106 T194 Refers-To the 0 T105 0 atom 0 T105 0 . -1 -1 0 We -1 -1 0 now -1 -1 0 know -1 -1 0 that -1 -1 0 atoms -1 -1 0 themselves -1 -1 0 can -1 -1 0 be -1 -1 0 subdivided -1 -1 0 , -1 -1 0 but -1 -1 0 their -1 -1 0 identity -1 -1 0 is -1 -1 0 destroyed -1 -1 0 in -1 -1 0 the -1 -1 0 process -1 -1 0 , -1 -1 0 so -1 -1 0 the -1 -1 0 Greeks -1 -1 0 were -1 -1 0 correct -1 -1 0 in -1 -1 0 a -1 -1 0 respect -1 -1 0 . -1 -1 0
Thanks for reporting - I'm looking into this now. It has to do with the fix we settled on for long distance relationships (i.e. Secondary Def --> Definition --> Term), which was to mark only the final tag in the relationship as the root, so that you would have relationships in the .deft files like this, where the Term is the root:
(Secondary Def, T1, T2, Supplements)
(Definition, T2, T3, Direct Defines)
(Term, T3, 0, 0)
I take it back - on inspection this is actually a problem with overlapping relationships. In this case, there was a referential-definition (this) that "refers-to" the definition (there is a smallest unit that cannot be further subdivided) and also "indirect-defines" the term (the atom). Someone brought this up in the forums yesterday and we're aware of the problem. I'm working on finding a fix right now that handles this scenario without undermining our existing data format.
Hi, there are still the problems with missing relations in train and dev sets (it seems I have an actual state of data, please check it): {'data/source_txt/t3_physics_2_101.deft': {'T105', 'T109', 'T134', 'T145', 'T31'}, 'data/source_txt/t6_sociology_1_101.deft': {'T125', 'T142', 'T58'}, 'data/source_txt/t1_biology_1_505.deft': {'T189', 'T195', 'T241', 'T246', 'T282', 'T283', 'T72', 'T74', 'T86'}, 'data/source_txt/t2_history_0_0.deft': {'T151', 'T162', 'T47', 'T81', 'T95'}, 'data/source_txt/t6_sociology_0_101.deft': {'T76', 'T98'}, 'data/source_txt/t2_history_2_101.deft': {'T111', 'T131'}, 'data/source_txt/t7_government_1_101.deft': {'T103', 'T116'}, 'data/source_txt/t7_government_1_404.deft': {'T13'}, 'data/source_txt/t1_biology_0_303.deft': {'T129', 'T131', 'T176', 'T26', 'T296', 'T79', 'T82', 'T9', 'T94'}, 'data/source_txt/t1_biology_1_404.deft': {'T113', 'T173', 'T194', 'T195', 'T223', 'T231', 'T36', 'T7'}, 'data/source_txt/t5_economic_1_0.deft': {'T103', 'T140', 'T154', 'T50', 'T73', 'T89', 'T95'}, 'data/source_txt/t1_biology_2_404.deft': {'T113', 'T150', 'T167', 'T205', 'T228', 'T295', 'T299', 'T42'}, 'data/source_txt/t4_psychology_2_0.deft': {'T127', 'T204', 'T209', 'T232', 'T38'}, 'data/source_txt/t3_physics_0_101.deft': {'T157', 'T174', 'T39'}, 'data/source_txt/t7_government_0_303.deft': {'T20'}, 'data/source_txt/t5_economic_0_202.deft': {'T137'}, 'data/source_txt/t5_economic_1_202.deft': {'T47'}, 'data/source_txt/t4_psychology_0_303.deft': {'T17'}, 'data/source_txt/t7_government_1_0.deft': {'T16'}, 'data/source_txt/t1_biology_2_606.deft': {'T207', 'T259', 'T28', 'T37', 'T59', 'T83'}, 'data/source_txt/t4_psychology_1_0.deft': {'T123', 'T165', 'T200', 'T216', 'T221', 'T32'}, 'data/source_txt/t2_history_2_0.deft': {'T146', 'T151', 'T179', 'T25', 'T53', 'T76'}, 'data/source_txt/t7_government_1_303.deft': {'T13'}, 'data/source_txt/t1_biology_1_303.deft': {'T105', 'T15', 'T86'}, 'data/source_txt/t7_government_0_202.deft': {'T31', 'T35'}, 'data/source_txt/t1_biology_0_101.deft': {'T131', 'T261', 'T82'}, 'data/source_txt/t4_psychology_2_101.deft': {'T198', 'T31', 'T7'}, 'data/source_txt/t4_psychology_0_202.deft': {'T102', 'T21', 'T35', 'T36', 'T83'}, 'data/source_txt/t5_economic_0_101.deft': {'T1', 'T180', 'T7', 'T86'}, 'data/source_txt/t2_history_1_0.deft': {'T110', 'T158', 'T23', 'T51', 'T69', 'T7'}, 'data/source_txt/t1_biology_2_505.deft': {'T204', 'T229', 'T36'}, 'data/source_txt/t6_sociology_0_0.deft': {'T147', 'T40', 'T54', 'T82'}, 'data/source_txt/t1_biology_2_303.deft': {'T227', 'T36', 'T61'}, 'data/source_txt/t1_biology_1_0.deft': {'T143', 'T177', 'T238', 'T27', 'T47', 'T80'}, 'data/source_txt/t1_biology_0_0.deft': {'T103', 'T105', 'T109', 'T139', 'T151', 'T193', 'T211'}, 'data/source_txt/t7_government_1_202.deft': {'T88', 'T97'}, 'data/source_txt/t1_biology_2_101.deft': {'T127', 'T236', 'T243', 'T257', 'T261'}, 'data/source_txt/t2_history_0_101.deft': {'T9', 'T95'}, 'data/source_txt/t4_psychology_0_101.deft': {'T228', 'T248', 'T272', 'T28'}, 'data/source_txt/t3_physics_1_101.deft': {'T113', 'T143', 'T212', 'T31', 'T74', 'T98'}, 'data/source_txt/t3_physics_1_0.deft': {'T123', 'T126', 'T135', 'T152', 'T34', 'T43'}, 'data/source_txt/t1_biology_0_202.deft': {'T101', 'T120', 'T151', 'T159', 'T169', 'T281', 'T292', 'T298', 'T314', 'T51', 'T52', 'T56', 'T6', 'T64', 'T70', 'T85'}, 'data/source_txt/t5_economic_2_0.deft': {'T105', 'T168', 'T171', 'T63', 'T77', 'T89'}, 'data/source_txt/t7_government_2_0.deft': {'T20', 'T31', 'T36', 'T6'}, 'data/source_txt/t1_biology_1_606.deft': {'T127', 'T136', 'T18', 'T213', 'T230', 'T28', 'T89', 'T94', 'T99'}, 'data/source_txt/t4_psychology_2_202.deft': {'T38'}, 'data/source_txt/t7_government_2_202.deft': {'T31'}, 'data/source_txt/t5_economic_2_101.deft': {'T65'}, 'data/source_txt/t7_government_0_404.deft': {'T32', 'T36', 'T43'}, 'data/source_txt/t1_biology_1_101.deft': {'T100', 'T180', 'T188', 'T254', 'T54', 'T55'}, 'data/source_txt/t6_sociology_2_101.deft': {'T31'}, 'data/source_txt/t3_physics_2_0.deft': {'T135', 'T182', 'T19', 'T8', 'T96'}, 'data/source_txt/t2_history_1_101.deft': {'T72', 'T81'}, 'data/source_txt/t1_biology_0_606.deft': {'T253', 'T3', 'T85'}, 'data/source_txt/t1_biology_0_404.deft': {'T15', 'T159', 'T232', 'T246', 'T288', 'T346', 'T38', 'T62', 'T77', 'T9'}, 'data/source_txt/t5_economic_0_0.deft': {'T145'}, 'data/source_txt/t5_economic_2_202.deft': {'T140', 'T2', 'T93'}, 'data/source_txt/t4_psychology_0_0.deft': {'T212', 'T4', 'T72', 'T78', 'T82'}, 'data/source_txt/t1_biology_2_0.deft': {'T39', 'T59', 'T72', 'T98'}, 'data/source_txt/t4_psychology_1_101.deft': {'T157', 'T178', 'T179', 'T189', 'T210'}, 'data/source_txt/t1_biology_1_202.deft': {'T116', 'T16', 'T163', 'T172', 'T271', 'T30', 'T40', 'T57'}, 'data/source_txt/t4_psychology_1_202.deft': {'T113', 'T155', 'T28', 'T4', 'T44'}, 'data/source_txt/t7_government_0_101.deft': {'T72'}, 'data/source_txt/t1_biology_2_202.deft': {'T194', 'T203', 'T230', 'T263', 'T77'}, 'data/source_txt/t3_physics_0_0.deft': {'T29'}, 'data/source_txt/t7_government_2_101.deft': {'T31'}, 'data/source_txt/t7_government_2_303.deft': {'T7', 'T9'}}
And here a little bit of left examples: {'data/source_txt/t1_biology_1_505.deft': {'T190', 'T195', 'T243', 'T246', 'T282', 'T283'}, 'data/source_txt/t1_biology_0_303.deft': {'T129', 'T131', 'T176', 'T296', 'T78', 'T94'}, 'data/source_txt/t1_biology_0_101.deft': {'T261'}, 'data/source_txt/t4_psychology_0_101.deft': {'T228', 'T248'}, 'data/source_txt/t5_economic_2_0.deft': {'T107', 'T78'}}