gen-arg
gen-arg copied to clipboard
Only 10 F1 score on wikievent dataset
Hi,
I tried to follow scripts/train_kairos.sh and scripts/test_kairos.sh but only received low performance as follow:
Role identification: P: 16.88, R: 4.456, F: 7.18 Role: P: 15.58, R: 4.21, F: 6.63 Coref Role identification: P: 19.48, R: 5.26, F: 8.29 Coref Role: P: 15.58, R: 4.21, F: 6.63
Even I tried to have more epochs , I can only get F1 score around 10. Is there anything goes wrong?
By the way, I failed to download the ckpt you shared on s3 due to a network error, is there any other way to acquire these files?
Thanks.
Hi Changhy,
Sorry about this, I checked the code and it seems the problem is in the test_kairos.sh script. --mark_trigger is one essential argument. (Note that if you have a preprocessed_KAIROS directory, the model will directly read from that directory and this option doesn't matter anymore.)
btw, the fixed scripts are uploaded.
Hi, even tho I have preprocessed_KAIROS directory, I still get F1 score around 10. May I know how can I fix it? Here's my test_kairos.sh script.
#!/usr/bin/env bash set -e set -x CKPT_NAME=gen-KAIROS MODEL=constrained-gen
rm -rf checkpoints/${CKPT_NAME}-pred
python train.py --model=$MODEL --ckpt_name=${CKPT_NAME}-pred
--load_ckpt=checkpoints/${CKPT_NAME}/epoch=2.ckpt
--dataset=KAIROS
--eval_only
--mark_trigger
--train_file=data/wikievents/train.jsonl
--val_file=data/wikievents/dev.jsonl
--test_file=data/wikievents/test.jsonl
--coref_dir=data/wikievents/coref
--train_batch_size=4
--eval_batch_size=4
--learning_rate=3e-5
--accumulate_grad_batches=4
--num_train_epochs=3
python src/genie/scorer.py --gen-file=checkpoints/$CKPT_NAME-pred/predictions.jsonl
--test-file=data/wikievents/test.jsonl
--dataset=KAIROS
--coref-file=data/wikievents/coref/test.jsonlines
--coref
A quick comparison shows that you are missing the --head-only keyword in the scoring script.
Can you double check the checkpoints/$CKPT_NAME-pred/predictions.jsonl file to see if the output looks normal? (You can also post a few lines here for me to check.
Hi, thank you for your reply!
According to the results, with or without --head-only keyword does not affect F1 score that much.
And then, the output of checkpoints/$CKPT_NAME-pred/predictions.jsonl file looks normal.
Below are my results and first 10 lines of the predictions.jsonl file.
My results
Evaluation by matching head words only.... Role identification: P: 29.17, R: 4.99, F: 8.52 Role: P: 26.04, R: 4.46, F: 7.61 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22
Without --head-only... Role identification: P: 27.08, R: 4.63, F: 7.91 Role: P: 25.00, R: 4.28, F: 7.31 Coref Role identification: P: 31.25, R: 5.35, F: 9.13 Coref Role: P: 28.12, R: 4.81, F: 8.22
Outputs of checkpoints/KAIROS-pred/predictions.jsonl
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " soldiers died at
The predictions should include the special token
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " <arg> detonated or exploded <arg> explosive device using <arg> to attack <arg> target at training center place", "gold": " <arg> detonated or exploded <arg> explosive device using <arg> to attack <arg> target at training center place"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " people died at <arg> place from <arg> medical issue, killed by <arg> killer", "gold": " people died at <arg> place from <arg> medical issue, killed by <arg> killer"}
{"doc_key": "wiki_mass_car_bombings_1_news_8", "predicted": " attackers detonated or exploded <arg> explosive device using <arg> to attack <arg> target at campus place", "gold": " <arg> detonated or exploded <arg> explosive device using <arg> to attack <arg> target at <arg> place"}```
My bad, don't know why special token disappeared after pasting it on github comment. The predictions look exactly like it supposed to be.
@Changhy1996 Hi, may I know did you solve this issue? If so, please kindly let me know how did you solve it~
I suspect something is wrong with the scorer.py function. What is the spacy version that you are using?
Hi! The spacy version that I am using is 3.5.1, and others' versions are as follow.
torch 1.11.0+cu113 spacy 3.5.1 transformers 4.26.1 pytorch-lightning 1.9.4 torch-struct 0.5
Hi, can we use en_core_web_trf instead of en_core_web_sm?
I've uploaded a copy of my prediction results to outputs/wikievents-pointer-pred/predictions.jsonl. Try running the scorer.py function locally and see if you get the results in Table 5 of the paper.
It works, thank you!