stanza How to replicate results of stanza constituency parser on Penn Treebank data

Hi, I'm trying to reproduce the results mentioned here for constituency parser on Penn treebank data. I have access to wsj data and I downloaded the wsj_bert.pt model by calling the following command:

stanza.Pipeline(lang='en', processors='tokenize,pos,constituency', package={'constituency': 'wsj_bert'})

The model is successfully downloaded and it is saved here: ~/stanza_resources/en/constituency

Now, I want to get the performance of this model on wsj test data. I called this command: ( I renamed test.trees to en_wsj_bert_test.mrg to keep the model's name and the data name consistents.)

python -m stanza.utils.training.run_constituency en_wsj_bert --save_dir ~/stanza_resources/en/constituency --score_test

This returns an awful score around 0.838210. I don't know where I make mistakes, but I would like to fix this. I'm going to use this as a baseline, so I need to replicate the scores exactly as mentioned here

Thanks for your help

Sep 12 '22 18:09 MHDBST

That should work. The first thing that comes to mind is that perhaps there are differences in the data format. What if you sent the first few lines of your version of the WSJ data to my personal email?

Sep 12 '22 19:09 AngledLuffa

Found it, I think. Your version does not have NML nodes, whereas ours does.

Our sentence 6:

( (S (NP-SBJ (NP (JJ Heavy) (NN selling)) (PP (IN of) (NP (NP (NNP Standard) (CC &) (NNP Poor) (POS 's)) (NML (CD 500) (HYPH -) (NN stock)) (NN index) (NNS futures))) (PP-LOC (IN in) (NP (NNP Chicago)))) (VP (ADVP-MNR (RB relentlessly)) (VBD beat) (NP (NNS stocks)) (ADVP-DIR (RB downward))) (. .)))

Your sentence 6:

( (S
    (NP-SBJ
      (NP (JJ Heavy) (NN selling) )
      (PP (IN of)
        (NP
          (NP (NNP Standard) (CC &) (NNP Poor) (POS 's) )
          (JJ 500-stock) (NN index) (NNS futures) ))
      (PP-LOC (IN in)
        (NP (NNP Chicago) )))
    (VP
      (ADVP-MNR (RB relentlessly) )
      (VBD beat)
      (NP (NNS stocks) )
      (ADVP-DIR (RB downward) ))
    (. .) ))

Sep 12 '22 20:09 AngledLuffa

I added a model ptb3_bert as part of 1.4.2. Would you let me know if that has a reasonable baseline? I got 95.7 on the test set with that model.

Sep 16 '22 07:09 AngledLuffa

Thanks @AngledLuffa, should I try it on dev branch? As you suggested last time, the problem was solved by switching to the dev branch.

Sep 16 '22 17:09 MHDBST

That has since been released! 1.4.1, then a quick update to 1.4.2 to relax some python dependencies

On Fri, Sep 16, 2022, 10:25 AM Mohadeseh Bastan @.***> wrote:

Thanks @AngledLuffa https://github.com/AngledLuffa, should I try it on dev branch? As you suggested last time, the problem was solved by switching to the dev branch.

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1118#issuecomment-1249609271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWKTZ3NPBE3HAKWB6ALV6SUQFANCNFSM6AAAAAAQKWV5LI . You are receiving this because you were mentioned.Message ID: @.***>

Oct 11 '22 07:10 AngledLuffa

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Dec 10 '22 07:12 stale[bot]