OKR icon indicating copy to clipboard operation
OKR copied to clipboard

Weird parsing by PropS wrapper

Open kleinay opened this issue 7 years ago • 3 comments

I encountered this example - its about tweet 258907583165911040 from boy_scouts:

{'Entities': {'A1': ('Released', (10,)), 'A2': ('Perversion Files Set', (4, 6, 7)), 'A3': ('Released', (22,)), 'A4': ('Perversion Files Set', (16, 18, 19))}, 'Predicates': {'P1': {'Arguments': ['A1', 'A2'], 'Bare predicate': ('To Be', (8, 9)), 'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [9])}, 'Template': '{A2} To Be {A1}'}, 'P2': {'Arguments': ['A3', 'A4', 'A2'], 'Bare predicate': ('To Be', (20, 21)), 'Head': {'Lemma': 'Be', 'POS': 'VB', 'Surface': ('Be', [21])}, 'Template': '{A2} {A4} To Be {A3}'}}, 'Sentence': "Boy Scouts ' Perversion ' Files Set To Be Released : Boy Scouts ' Perversion ' Files Set To Be Released"}

Note P2 template - Why is A2 also there?

kleinay avatar Aug 18 '17 14:08 kleinay

I'm not sure what's going on in the input sentence. Is it: Boy Scouts ' Perversion ' Files Set To Be Released : Boy Scouts ' Perversion ' Files Set To Be Released

Seems to be repeating information, right?

In that case, proposition P2 is (arguments in brackets): [Perversion Files Set] [Perversion Files Set] To Be [Released]

Which doesn't seem too far off. My guess is that the parser gets confused due to the weird sentence structure.

gabrielStanovsky avatar Aug 19 '17 17:08 gabrielStanovsky

Yes, That is the sentence, and it is indeed confusing for the parser. I think we better find a way to handle these duplicated sentences, if possible. Maybe at the preprocessing stage.

kleinay avatar Aug 19 '17 18:08 kleinay

Is that something that happens often? I think that if we handle this, then it should be handled in a separate cleaning stage (with e.g. #18).

gabrielStanovsky avatar Aug 19 '17 18:08 gabrielStanovsky