transition-amr-parser icon indicating copy to clipboard operation
transition-amr-parser copied to clipboard

parser doesn't produce amr-unknown

Open PolKul opened this issue 3 years ago • 5 comments

I was able to train the parser as per your instructions. But when testing the trained model I found that it didn't produce amr-unknown node. For example:

Text: Which architect of Marine Corps Air Station Kaneohe Bay was also tenant of New Sanno hotel?
# ::node	1	person	1-2
# ::node	2	architect-01	1-2
# ::node	3	facility	3-9
# ::node	5	also	10-11
# ::node	6	reside-01	11-12
# ::node	7	company	13-16
# ::node	10	name	3-9
# ::node	11	"Marine"	3-9
# ::node	12	"Corps"	3-9
# ::node	13	"Air"	3-9
# ::node	14	"Station"	3-9
# ::node	15	"Kaneohe"	3-9
# ::node	16	"Bay"	3-9
# ::node	18	name	13-16
# ::node	19	"New"	13-16
# ::node	20	"Sanno"	13-16
# ::node	21	"Hotel"	13-16
# ::root	6	reside-01
# ::edge	person	ARG0-of	architect-01	1	2	
# ::edge	architect-01	ARG1	facility	2	3	
# ::edge	reside-01	mod	also	6	5	
# ::edge	reside-01	ARG0	person	6	1	
# ::edge	reside-01	ARG1	company	6	7	
# ::edge	facility	name	name	3	10	
# ::edge	name	op1	"Marine"	10	11	
# ::edge	name	op2	"Corps"	10	12	
# ::edge	name	op3	"Air"	10	13	
# ::edge	name	op4	"Station"	10	14	
# ::edge	name	op5	"Kaneohe"	10	15	
# ::edge	name	op6	"Bay"	10	16	
# ::edge	company	name	name	7	18	
# ::edge	name	op1	"New"	18	19	
# ::edge	name	op2	"Sanno"	18	20	
# ::edge	name	op3	"Hotel"	18	21	
# ::short	{1: 'p', 2: 'a', 3: 'f', 5: 'a2', 6: 'r', 7: 'c', 10: 'n', 11: 'x0', 12: 'x1', 13: 'x2', 14: 'x3', 15: 'x4', 16: 'x5', 18: 'n2', 19: 'x6', 20: 'x7', 21: 'x8'}	
(r / reside-01
      :ARG0 (p / person
            :ARG0-of (a / architect-01
                  :ARG1 (f / facility
                        :name (n / name
                              :op1 "Marine"
                              :op2 "Corps"
                              :op3 "Air"
                              :op4 "Station"
                              :op5 "Kaneohe"
                              :op6 "Bay"))))
      :ARG1 (c / company
            :name (n2 / name
                  :op1 "New"
                  :op2 "Sanno"
                  :op3 "Hotel"))
      :mod (a2 / also))

PolKul avatar Aug 26 '21 23:08 PolKul

parsing the same sentence with amrlib parser, for example, gives me this result with amr-unknown:

# ::snt Which architect of Marine Corps Air Station Kaneohe Bay was also tenant of New Sanno hotel?
(t / tenant-01
      :ARG0 (a / amr-unknown
            :ARG0-of (a2 / architect-01
                  :ARG1 (f / facility
                        :name (n / name
                              :op1 "Marine"
                              :op2 "Corps"
                              :op3 "Air"
                              :op4 "Station"
                              :op5 "Kaneohe"
                              :op6 "Bay"))))
      :ARG1 (h / hotel
            :name (n2 / name
                  :op1 "New"
                  :op2 "Sanno"))
      :mod (a3 / also))

PolKul avatar Aug 27 '21 02:08 PolKul

It should produce amr-unknown, we use this often for question parsing.

What did you trained it with? I just checked on a v0.4.2 deploy and it parses correctly. Also, do you tokenize?

ramon-astudillo avatar Aug 28 '21 00:08 ramon-astudillo

hi @ramon-astudillo, well, I was trying to follow your setup instructions from here for setup and training (the default action-pointer network config bash run/run_experiment.sh configs/amr2.0-action-pointer.sh ). This is the code for inference:

from transition_amr_parser.parse import AMRParser
amr_parser_checkpoint = "/DATA/AMR2.0/models/exp_cofill_o8.3_act-states_RoBERTa-large-top24/_act-pos-grh_vmask1_shiftpos1_ptr-lay6-h1_grh-lay123-h2-allprev_1in1out_cam-layall-h2-abuf/ep120-seed42/checkpoint_best.pt"
parser = AMRParser.from_checkpoint(amr_parser_checkpoint)
words = [word.strip(string.punctuation) for word in text.split()]
annotations = parser.parse_sentences([words])

PolKul avatar Aug 28 '21 03:08 PolKul

would mind sharing your trained checkpoint to see if it makes any difference?

PolKul avatar Aug 28 '21 03:08 PolKul

would mind sharing your trained checkpoint to see if it makes any difference?

I am certain it should. We are looking into sharing pre-trained models but I can not say anything at this point.

Also FYI we will update to v0.5.1 soon (post EMNLP preprint submission deadline). This new model (Structured-BART) is new SoTA for AMR2.0 and will be published at EMNLP2021, a non updated prerprint is here https://openreview.net/forum?id=qjDQCHLXCNj

From experience in parsing questions, I can say silver-data fine-tuning works well. You can parse some text corpus with questions, filter it with a couple of rules*, and the use it as additional training data. The training scheme silver+gold pre-training with gold fine-tuning seems to work best, see e.g. https://aclanthology.org/2020.findings-emnlp.288/

(*) For example ignore all parses having :rel (which indicates a detached subgraph) or with missing amr-unknown (if you are certain it should have one).

ramon-astudillo avatar Aug 28 '21 17:08 ramon-astudillo