mscproject icon indicating copy to clipboard operation
mscproject copied to clipboard

Stanford CoreNLP

Open federicoruggeri opened this issue 7 years ago • 8 comments

Dear Mr. Ferreira, I kindly ask you if it's possible to know the Stanford CoreNLP version that was used in order to parse sentences. I'm currently using version "2014-08-27", but parsed dependencies are missing the "-stanford_idx" number.

Example taken from Untitle1.ipynb: nlp.parse("She didn't see the elephant")

Expected output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT-0', u'see-4'],
    [u'nsubj', u'see-4', u'She-1'],
    [u'aux', u'see-4', u'did-2'],
    [u'neg', u'see-4', u"n't-3"],
    [u'det', u'elephant-6', u'the-5'],
    [u'dobj', u'see-4', u'elephant-6']],
   u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))",
   u'text': u"She didn't see the elephant",
   u'words': [[u'She',
     {u'CharacterOffsetBegin': u'0',
      u'CharacterOffsetEnd': u'3',
      u'Lemma': u'she',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'PRP'}],
    [u'did',
     {u'CharacterOffsetBegin': u'4',
      u'CharacterOffsetEnd': u'7',
      u'Lemma': u'do',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'VBD'}],
    [u"n't",
     {u'CharacterOffsetBegin': u'7',
      u'CharacterOffsetEnd': u'10',
      u'Lemma': u'not',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'RB'}],
    [u'see',
     {u'CharacterOffsetBegin': u'11',
      u'CharacterOffsetEnd': u'14',
      u'Lemma': u'see',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'VB'}],
    [u'the',
     {u'CharacterOffsetBegin': u'15',
      u'CharacterOffsetEnd': u'18',
      u'Lemma': u'the',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'DT'}],
    [u'elephant',
     {u'CharacterOffsetBegin': u'19',
      u'CharacterOffsetEnd': u'27',
      u'Lemma': u'elephant',
      u'NamedEntityTag': u'O',
      u'PartOfSpeech': u'NN'}]]}]}

Stanford CoreNLP 2014-08-27 (used) output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'see'],
                                   [u'nsubj', u'see', u'She'],
                                   [u'aux', u'see', u'did'],
                                   [u'neg', u'see', u"n't"],
                                   [u'det', u'elephant', u'the'],
                                   [u'dobj', u'see', u'elephant']],
                 u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))",
                 u'text': u"She didn't see the elephant",
                 u'words': [[u'She',
                             {u'CharacterOffsetBegin': u'0',
                              u'CharacterOffsetEnd': u'3',
                              u'Lemma': u'she',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'PRP'}],
                            [u'did',
                             {u'CharacterOffsetBegin': u'4',
                              u'CharacterOffsetEnd': u'7',
                              u'Lemma': u'do',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'VBD'}],
                            [u"n't",
                             {u'CharacterOffsetBegin': u'7',
                              u'CharacterOffsetEnd': u'10',
                              u'Lemma': u'not',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'RB'}],
                            [u'see',
                             {u'CharacterOffsetBegin': u'11',
                              u'CharacterOffsetEnd': u'14',
                              u'Lemma': u'see',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'VB'}],
                            [u'the',
                             {u'CharacterOffsetBegin': u'15',
                              u'CharacterOffsetEnd': u'18',
                              u'Lemma': u'the',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'DT'}],
                            [u'elephant',
                             {u'CharacterOffsetBegin': u'19',
                              u'CharacterOffsetEnd': u'27',
                              u'Lemma': u'elephant',
                              u'NamedEntityTag': u'O',
                              u'PartOfSpeech': u'NN'}]]}]}


Is it just a version issue or is it something else?

Best Wishes, Federico Ruggeri

federicoruggeri avatar Oct 12 '17 13:10 federicoruggeri

Hello

I will check tonight, but it was a while ago now and I may have uninstalled the Stanford parser from my mac. What exactly is the problem?

Will

On 12 Oct 2017 2:12 pm, "federicoruggeri" [email protected] wrote:

Dear Mr. Ferreira, I kindly ask you if it's possible to know the Stanford CoreNLP version that was used in order to parse sentences. I'm currently using version "2014-08-27", but parsed dependencies are missing the "-stanford_idx" number.

Example taken from Untitle1.ipynb: nlp.parse("She didn't see the elephant")

Expected output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT-0', u'see-4'], [u'nsubj', u'see-4', u'She-1'], [u'aux', u'see-4', u'did-2'], [u'neg', u'see-4', u"n't-3"], [u'det', u'elephant-6', u'the-5'], [u'dobj', u'see-4', u'elephant-6']], u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))", u'text': u"She didn't see the elephant", u'words': [[u'She', {u'CharacterOffsetBegin': u'0', u'CharacterOffsetEnd': u'3', u'Lemma': u'she', u'NamedEntityTag': u'O', u'PartOfSpeech': u'PRP'}], [u'did', {u'CharacterOffsetBegin': u'4', u'CharacterOffsetEnd': u'7', u'Lemma': u'do', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VBD'}], [u"n't", {u'CharacterOffsetBegin': u'7', u'CharacterOffsetEnd': u'10', u'Lemma': u'not', u'NamedEntityTag': u'O', u'PartOfSpeech': u'RB'}], [u'see', {u'CharacterOffsetBegin': u'11', u'CharacterOffsetEnd': u'14', u'Lemma': u'see', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VB'}], [u'the', {u'CharacterOffsetBegin': u'15', u'CharacterOffsetEnd': u'18', u'Lemma': u'the', u'NamedEntityTag': u'O', u'PartOfSpeech': u'DT'}], [u'elephant', {u'CharacterOffsetBegin': u'19', u'CharacterOffsetEnd': u'27', u'Lemma': u'elephant', u'NamedEntityTag': u'O', u'PartOfSpeech': u'NN'}]]}]}

Stanford CoreNLP 2014-08-27 (used) output:

{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'see'], [u'nsubj', u'see', u'She'], [u'aux', u'see', u'did'], [u'neg', u'see', u"n't"], [u'det', u'elephant', u'the'], [u'dobj', u'see', u'elephant']], u'parsetree': u"(ROOT (S (NP (PRP She)) (VP (VBD did) (RB n't) (VP (VB see) (NP (DT the) (NN elephant))))))", u'text': u"She didn't see the elephant", u'words': [[u'She', {u'CharacterOffsetBegin': u'0', u'CharacterOffsetEnd': u'3', u'Lemma': u'she', u'NamedEntityTag': u'O', u'PartOfSpeech': u'PRP'}], [u'did', {u'CharacterOffsetBegin': u'4', u'CharacterOffsetEnd': u'7', u'Lemma': u'do', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VBD'}], [u"n't", {u'CharacterOffsetBegin': u'7', u'CharacterOffsetEnd': u'10', u'Lemma': u'not', u'NamedEntityTag': u'O', u'PartOfSpeech': u'RB'}], [u'see', {u'CharacterOffsetBegin': u'11', u'CharacterOffsetEnd': u'14', u'Lemma': u'see', u'NamedEntityTag': u'O', u'PartOfSpeech': u'VB'}], [u'the', {u'CharacterOffsetBegin': u'15', u'CharacterOffsetEnd': u'18', u'Lemma': u'the', u'NamedEntityTag': u'O', u'PartOfSpeech': u'DT'}], [u'elephant', {u'CharacterOffsetBegin': u'19', u'CharacterOffsetEnd': u'27', u'Lemma': u'elephant', u'NamedEntityTag': u'O', u'PartOfSpeech': u'NN'}]]}]}

Is it just a version issue or is it something else?

Best Wishes, Federico Ruggeri

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8nonGcrYfe4rTm_oo6-dhP-glWvjks5srhA6gaJpZM4P2-93 .

willferreira avatar Oct 12 '17 13:10 willferreira

Dear Mr. Ferreira, The problem concerns the 'dependencies' list. For example: [u'root', u'ROOT-0', u'see-4'] differs from [u'root', u'ROOT', u'see'] (my output)

As a result, the construction of the file 'stanparse-depths.pickle' fails. More precisely, in "src/model/utils.py" the method get_stanford_idx(x) fails since the number it is looking for is missing (in the example above: '0' of 'ROOT-0'). This problem can be verified by running the run_calc_stan_parse_depths.py script inside "bin" folder. Since the repository is missing the Stanford CoreNLP python wrapper, I don't know exactly what implementation was used. I'm currently using the following one: https://github.com/dasmith/stanford-corenlp-python

I kindly thank you for your time, Best Wishes, Federico Ruggeri

federicoruggeri avatar Oct 12 '17 13:10 federicoruggeri

I will take a look and get back to you ASAP, however over 3 years have passed since I looked at this and so Iay not be able to help. What is your interest in the work?

Will

On 12 Oct 2017 2:39 pm, "federicoruggeri" [email protected] wrote:

Dear Mr. Ferreira, The problem concerns the 'dependencies' list. For example: [u'root', u'ROOT-0', u'see-4'] differs from [u'root', u'ROOT', u'see'] (my output)

As a result, the construction of the file 'stanparse-depths.pickle' fails. More precisely, in "src/model/utils.py" the method get_stanford_idx(x) fails since the number it is looking for is missing (in the example above: '0' of 'ROOT-0'). This problem can be verified by running the run_calc_stan_parse_depths.py script inside "bin" folder. Since the repository is missing the Stanford CoreNLP python wrapper, I don't know exactly what implementation was used. I'm currently using the following one: https://github.com/dasmith/stanford-corenlp-python

I kindly thank you for your time, Best Wishes, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336140177, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8tzkMHwpWNtajU8BBhWAzYMQA53Pks5srhaqgaJpZM4P2-93 .

willferreira avatar Oct 12 '17 13:10 willferreira

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

federicoruggeri avatar Oct 12 '17 14:10 federicoruggeri

I think you might find better results than mine now. Take a look at the fake news challenge.

On 12 Oct 2017 3:05 pm, "federicoruggeri" [email protected] wrote:

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336147551, or mute the thread https://github.com/notifications/unsubscribe-auth/AFid8g2TKIFBL_8Rpf9BJjoOK0-nIvqZks5srhyMgaJpZM4P2-93 .

willferreira avatar Oct 12 '17 14:10 willferreira

Hi Federico,

Yes, if your goal is to run models like ours on other datasets, you will be best served by more recent code. Take a look at the following repos for example: https://github.com/j6mes/fnc-ensemble https://github.com/uclmr/fakenewschallenge

Best wishes, Andreas

On Thu, 12 Oct 2017 at 15:08 William Ferreira [email protected] wrote:

I think you might find better results than mine now. Take a look at the fake news challenge.

On 12 Oct 2017 3:05 pm, "federicoruggeri" [email protected] wrote:

Dear Mr. Ferreira, I'm a student of the university of Bologna (UNIBO). I'm currently studying stance classification for my master degree thesis under the guidance of professor Torroni and researcher Marco Lippi. More precisely, my aim regards argument structure prediction by exploiting stance classification techniques. As a matter of fact, I was curious to experiment known classifiers, such as the one used for Emergent, with other datasets in the same research field.

Yours Sincerely, Federico Ruggeri

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/willferreira/mscproject/issues/7#issuecomment-336147551 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFid8g2TKIFBL_8Rpf9BJjoOK0-nIvqZks5srhyMgaJpZM4P2-93

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/willferreira/mscproject/issues/7#issuecomment-336148551, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbUhRTHDXleCixKbQz62XvBlHRB1Qpjks5srh1cgaJpZM4P2-93 .

andreasvlachos avatar Oct 12 '17 14:10 andreasvlachos

Dear Mr. Ferreira and Mr. Vlachos, My aim regards predicting evidence stance towards a given claim. Since the Emergent dataset contains claims with respect to articles, my idea was to use the related classifier with another dataset, i.e. CE-ACL-14 (IBM), which reports claims and evidences extracted from articles. This is just an experiment since from what I know there are no corpora that couple evidences and claims, considering both opposing and supporting links between them. For this reason, a first attempt was to use the Emergent classifier with the CE-ACL-14 dataset and analyse the results. I kindly thank you for your time and for the content given (I was surprised to receive an answer in such short time). I don't want you to waste a long amount of time if it is required to solve my issue. Trying the Emergent classifier was just my first idea.

Yours Sincerely, Federico Ruggeri

federicoruggeri avatar Oct 12 '17 14:10 federicoruggeri

@federicoruggeri If this is still relevant, can adapt the algorithm to the latest Stanford core-nlp version (2018-10-05) like so:

for dependency in sentence['basicDependencies']:
  relationship = dependency['dep']
  head_idx = int(dependency['governor']) - 1                   
  head = dependency['governorGloss']
  dependent_idx = int(dependency['dependent']) - 1
  dependent = dependency['dependentGloss']

There is no more dependencies, instead we have our choice of basicDependencies, enhancedDependencies, and enhancedPlusPlusDependencies. The fields are all already parsed so we don't need to do string manipulation to get the components.

sapieneptus avatar Dec 29 '18 12:12 sapieneptus