AMRBART
AMRBART copied to clipboard
Code for our paper "Graph Pre-training for AMR Parsing and Generation" in ACL2022
AMRBART
The refactored implementation for ACL2022 paper "Graph Pre-training for AMR Parsing and Generation". You may find our paper here (Arxiv). The original implementation is avaliable here
News🎈
- (2022/12/10) fix max_length bugs in AMR parsing and update results.
- (2022/10/16) release the AMRBART-v2 model which is simpler, faster, and stronger.
Requirements
- python 3.8
- pytorch 1.8
- transformers 4.21.3
- datasets 2.4.0
- Tesla V100 or A100
We recommend to use conda to manage virtual environments:
conda env update --name <env> --file requirements.yml
Data Processing
You may download the AMR corpora at LDC.
Please follow this respository to preprocess AMR graphs:
bash run-process-acl2022.sh
Usage
Our model is avaliable at huggingface. Here is how to initialize a AMR parsing model in PyTorch:
from transformers import BartForConditionalGeneration
from model_interface.tokenization_bart import AMRBartTokenizer # We use our own tokenizer to process AMRs
model = BartForConditionalGeneration.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2")
tokenizer = AMRBartTokenizer.from_pretrained("xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2")
Pre-training
bash run-posttrain-bart-textinf-joint-denoising-6task-large-unified-V100.sh "facebook/bart-large"
Fine-tuning
For AMR Parsing, run
bash train-AMRBART-large-AMRParsing.sh "xfbai/AMRBART-large-v2"
For AMR-to-text Generation, run
bash train-AMRBART-large-AMR2Text.sh "xfbai/AMRBART-large-v2"
Evaluation
cd evaluation
For AMR Parsing, run
bash eval_smatch.sh /path/to/gold-amr /path/to/predicted-amr
For better results, you can postprocess the predicted AMRs using the BLINK tool following SPRING.
For AMR-to-text Generation, run
bash eval_gen.sh /path/to/gold-text /path/to/predicted-text
Inference on your own data
If you want to run our code on your own data, try to transform your data into the format here, then run
For AMR Parsing, run
bash inference_amr.sh "xfbai/AMRBART-large-finetuned-AMR3.0-AMRParsing-v2"
For AMR-to-text Generation, run
bash inference_text.sh "xfbai/AMRBART-large-finetuned-AMR3.0-AMR2Text-v2"
Pre-trained Models
Pre-trained AMRBART
Setting | Params | checkpoint |
---|---|---|
AMRBART-large | 409M | model |
Fine-tuned models on AMR-to-Text Generation
Setting | BLEU(JAMR_tok) | Sacre-BLEU | checkpoint | output |
---|---|---|---|---|
AMRBART-large (AMR2.0) | 50.76 | 50.44 | model | output |
AMRBART-large (AMR3.0) | 50.29 | 50.38 | model | output |
To get the tokenized bleu score, you need to use the scorer we provide here. We use this script in order to ensure comparability with previous approaches.
Fine-tuned models on AMR Parsing
Setting | Smatch(amrlib) | Smatch(amr-evaluation) | Smatch++(smatchpp) | checkpoint | output |
---|---|---|---|---|---|
AMRBART-large (AMR2.0) | 85.5 | 85.3 | 85.4 | model | output |
AMRBART-large (AMR3.0) | 84.4 | 84.2 | 84.3 | model | output |
Acknowledgements
We thank authors of SPRING, amrlib, and BLINK that share open-source scripts for this project.
References
@inproceedings{bai-etal-2022-graph,
title = "Graph Pre-training for {AMR} Parsing and Generation",
author = "Bai, Xuefeng and
Chen, Yulong and
Zhang, Yue",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.415",
pages = "6001--6015"
}