Standard_Korean_GEC About m2 score evaluation

About m2 score evaluation

Open soyoung97 opened this issue 1 year ago • 1 comments

Recently, someone asked about the m2 evaluation by mail. (By the way, i've recently moved from kaist to SNU, so [email protected] OR [email protected] is preferred than [email protected] or [email protected]) I thought sharing the response may be helpful to someone else, so copy& pasting the responses here -

Jan 02 '24 07:01 soyoung97

About the question - For evaluation of gleu, you can check out the README file at: https://github.com/soyoung97/Standard_Korean_GEC/tree/main/eval.

About the m2 file evaluation - (I've written the code a long time ago so I'm not sure if it works currently.) I've run the m2 scorer during the train step. (Full evaluation output and code, with links to ckpt are at https://docs.google.com/spreadsheets/d/1II_BB10YPijp1Rgw3ZgQElvv6pw7xINOdTpJbAPz484/edit#gid=0)

For example, by running the following, the code automatically trains and outputs m2 score after each epoch of training ends. python3 src/KoBART-gec/train.py --train_mode normal --data union --default_root_dir '../../output/union' --max_epochs 10 --lr 3e-05 --from_pretrained '' --SEED 1 --batch_size 64 --dropout 0.1

If you're only looking for the evaluation code itself, you can reference here: https://github.com/soyoung97/Standard_Korean_GEC/blob/main/src/KoBART-gec/train.py#L310 I've run the command in the following format: ./metric/m2scorer/m2scorer {directory}/hypothesis_{total_loss}.txt ../../extract_data/{self.args.data}/{self.args.data}_{mode}.m2 > {directory}/m2score.txt" You may need to install some dependencies if needed. The code for m2scorer can be referenced at https://github.com/soyoung97/Standard_Korean_GEC/blob/main/src/KoBART-gec/metric/m2scorer/scripts/m2scorer.py.

I hope this helps!

Best regards, Soyoung Yoon

-----Original Message----- From: To: "[email protected]" [email protected]; Subject: KAGAS M2 question

Dear Soyoung,

I'm trying to use your KAGAS tool to get error type performance on a model for my research and I was wondering how you got the M^2 scores for each error type like Table 7 of the "Towards standardizing Korean Grammatical Error Correction: Datasets and Annotation" paper. Did you create an M2 file using KAGAS for each error type and then evaluate the model output with that M2 file or something else?

Looking forward to hearing from you

Jan 02 '24 07:01 soyoung97

Standard_Korean_GEC Standard_Korean_GEC copied to clipboard

About m2 score evaluation

Standard_Korean_GEC
Standard_Korean_GEC copied to clipboard