AMR-gs icon indicating copy to clipboard operation
AMR-gs copied to clipboard

Training with LDC2020T02 and "amrlib" project

Open bjascob opened this issue 3 years ago • 0 comments

Just FYI in case anyone's interested...

I trained the "no recategorization" branch of the model with LDC2020T02 (amr 3.0) and got a 76.7 smatch score. I didn't spend much time trying to optimize hyper-parameters and I'm using the amr 2.0 utils directory so possibly there's additional optimizations to be had, or maybe the amr 3 corpus is just more complex with the new "multi-sentence" annotations, etc..

I'm also using this model in amrlib with all of the sheng-z/stog code removed. In my version of the model code there's no pre/post processing at all. In addition, I've also switched to spaCy for annotations. I'm getting about the same 77 smatch under these conditions.

amrlib is intended as a user library for parsing and generation. I've simplified some of the parsing routines for the end-user and updated code to the latest version of penman, pytorch, sped up smatch scoring, etc.. Feel free to pull portions of revised code if you have any interest. I'd be happy to see a little more optimization of the model in that setting, though I'm not planning on focusing it myself.

The library also includes a Huggingface T5 model re-trained for graph-to-sentence generation that gets a 43 BLEU on LDC2020T02. It's a lot easier coding wise than jcyk/gtos and amazingly effective.

bjascob avatar Sep 05 '20 01:09 bjascob