OKR
OKR copied to clipboard
Weird parsing by PropS wrapper
I encountered this example - its about tweet 258907583165911040 from boy_scouts:
{'Entities': {'A1': ('Released', (10,)), 'A2': ('Perversion Files Set', (4, 6, 7)), 'A3': ('Released', (22,)), 'A4': ('Perversion Files Set', (16, 18, 19))}, 'Predicates': {'P1': {'Arguments': ['A1', 'A2'], 'Bare predicate': ('To Be', (8, 9)), 'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [9])}, 'Template': '{A2} To Be {A1}'}, 'P2': {'Arguments': ['A3', 'A4', 'A2'], 'Bare predicate': ('To Be', (20, 21)), 'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [21])}, 'Template': '{A2} {A4} To Be {A3}'}}, 'Sentence': "Boy Scouts '
Perversion ' Files Set To Be Released : Boy Scouts '
Perversion ' Files Set To Be Released"}
Note P2 template - Why is A2 also there?
I'm not sure what's going on in the input sentence. Is it:
Boy Scouts ' Perversion ' Files Set To Be Released : Boy Scouts ' Perversion ' Files Set To Be Released
Seems to be repeating information, right?
In that case, proposition P2 is (arguments in brackets):
[Perversion Files Set] [Perversion Files Set] To Be [Released]
Which doesn't seem too far off. My guess is that the parser gets confused due to the weird sentence structure.
Yes, That is the sentence, and it is indeed confusing for the parser. I think we better find a way to handle these duplicated sentences, if possible. Maybe at the preprocessing stage.
Is that something that happens often? I think that if we handle this, then it should be handled in a separate cleaning stage (with e.g. #18).