gen-arg Only 10 F1 score on wikievent dataset

Hi, I tried to follow scripts/train_kairos.sh and scripts/test_kairos.sh but only received low performance as follow:

Role identification: P: 16.88, R: 4.456, F: 7.18 Role: P: 15.58, R: 4.21, F: 6.63 Coref Role identification: P: 19.48, R: 5.26, F: 8.29 Coref Role: P: 15.58, R: 4.21, F: 6.63

Even I tried to have more epochs , I can only get F1 score around 10. Is there anything goes wrong?

By the way, I failed to download the ckpt you shared on s3 due to a network error, is there any other way to acquire these files?

Thanks.

Mar 30 '22 09:03 Changhy1996

Hi Changhy, Sorry about this, I checked the code and it seems the problem is in the test_kairos.sh script. --mark_trigger is one essential argument. (Note that if you have a preprocessed_KAIROS directory, the model will directly read from that directory and this option doesn't matter anymore.)

Apr 12 '22 01:04 raspberryice

btw, the fixed scripts are uploaded.

Apr 12 '22 01:04 raspberryice

Hi, even tho I have preprocessed_KAIROS directory, I still get F1 score around 10. May I know how can I fix it? Here's my test_kairos.sh script.

#!/usr/bin/env bash set -e set -x CKPT_NAME=gen-KAIROS MODEL=constrained-gen

rm -rf checkpoints/${CKPT_NAME}-pred python train.py --model=$MODEL --ckpt_name=${CKPT_NAME}-pred
--load_ckpt=checkpoints/${CKPT_NAME}/epoch=2.ckpt
--dataset=KAIROS
--eval_only
--mark_trigger
--train_file=data/wikievents/train.jsonl
--val_file=data/wikievents/dev.jsonl
--test_file=data/wikievents/test.jsonl
--coref_dir=data/wikievents/coref
--train_batch_size=4
--eval_batch_size=4
--learning_rate=3e-5
--accumulate_grad_batches=4
--num_train_epochs=3

python src/genie/scorer.py --gen-file=checkpoints/$CKPT_NAME-pred/predictions.jsonl
--test-file=data/wikievents/test.jsonl
--dataset=KAIROS
--coref-file=data/wikievents/coref/test.jsonlines
--coref

Mar 20 '23 05:03 SapaePhyu

A quick comparison shows that you are missing the --head-only keyword in the scoring script.

Can you double check the checkpoints/$CKPT_NAME-pred/predictions.jsonl file to see if the output looks normal? (You can also post a few lines here for me to check.

Mar 20 '23 06:03 raspberryice

Hi, thank you for your reply!

According to the results, with or without --head-only keyword does not affect F1 score that much.

And then, the output of checkpoints/$CKPT_NAME-pred/predictions.jsonl file looks normal.

Below are my results and first 10 lines of the predictions.jsonl file.

My results

Evaluation by matching head words only.... Role identification: P: 29.17, R: 4.99, F: 8.52 Role: P: 26.04, R: 4.46, F: 7.61 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22

Without --head-only... Role identification: P: 27.08, R: 4.63, F: 7.91 Role: P: 25.00, R: 4.28, F: 7.31 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22

Outputs of checkpoints/KAIROS-pred/predictions.jsonl

{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " members died at place from medical issue, killed by The Taliban killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at training center place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " people died at place from medical issue, killed by killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosives explosive device using to attack campus target at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers detonated or exploded explosive device using to attack target at place", "gold": " gunmen detonated or exploded explosive device using to attack soldiers target at campus place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at place from medical issue, killed by killer", "gold": " members died at complex place from medical issue, killed by killer"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers identified as at place", "gold": " he identified bodies as at place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " detonated or exploded explosive device using to attack target at place", "gold": " detonated or exploded explosive device using to attack target at southeastern place"} {"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " was injured by using in body part with medical issue at place", "gold": " 10 was injured by using in body part with medical issue at place"}

Mar 20 '23 07:03 SapaePhyu

The predictions should include the special token which is used for matching the filled arguments:

{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at training center place", "gold": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at training center place"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " people died at  <arg>  place from  <arg>  medical issue, killed by  <arg>  killer", "gold": " people died at  <arg>  place from  <arg>  medical issue, killed by  <arg>  killer"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " attackers detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at campus place", "gold": " <arg>  detonated or exploded  <arg>  explosive device using  <arg>  to attack  <arg>  target at  <arg>  place"}```

Mar 20 '23 16:03 raspberryice

My bad, don't know why special token disappeared after pasting it on github comment. The predictions look exactly like it supposed to be.

Mar 21 '23 04:03 SapaePhyu

@Changhy1996 Hi, may I know did you solve this issue? If so, please kindly let me know how did you solve it~

Mar 24 '23 07:03 SapaePhyu

I suspect something is wrong with the scorer.py function. What is the spacy version that you are using?

Mar 24 '23 16:03 raspberryice

Hi! The spacy version that I am using is 3.5.1, and others' versions are as follow.

torch 1.11.0+cu113 spacy 3.5.1 transformers 4.26.1 pytorch-lightning 1.9.4 torch-struct 0.5

Mar 26 '23 06:03 SapaePhyu

Hi, can we use en_core_web_trf instead of en_core_web_sm?

Mar 26 '23 08:03 SapaePhyu

I've uploaded a copy of my prediction results to outputs/wikievents-pointer-pred/predictions.jsonl. Try running the scorer.py function locally and see if you get the results in Table 5 of the paper.

Mar 28 '23 02:03 raspberryice

It works, thank you!

Mar 28 '23 09:03 SapaePhyu

gen-arg gen-arg copied to clipboard

Only 10 F1 score on wikievent dataset

My results

Outputs of checkpoints/KAIROS-pred/predictions.jsonl

gen-arg
gen-arg copied to clipboard