UMGF Failed to reproduce the results

Thank you for your released codes. I tried to follow your instruction to train the model but I cannot reproduce your reported results in the paper. The F1 score on Twitter 15 and Twitter 17 datasets I got is shown as follows:

I have tried to run the do_test command but the F1 score does not change significantly. Moreover, I have run the experiment over 5 times on each dataset but didn't find a significant difference in the score. Do you have any suggestions?

Jan 05 '22 08:01 wangxinyu0922

Thank you for your questions.

I conduct experiments on such hardware conditions: maybe the difference of hardware lead to the gap of results?
Have you tried the checkpoint of twitter2015 I provided to evaluate? I checked it again just now and get the following result:

Jan 05 '22 09:01 TransformersWsz

My CUDA version and driver version is identical to yours. I tried your checkpoint and the score is identical to yours (by the way, the model is saved in cuda:3. It is not good). However, I have no idea how to train such a model. Moreover, could you provide a model for twitter 17?

Jan 06 '22 02:01 wangxinyu0922

I have retrained the model of twitter2017 following the instructions of README.md and get the following result:

2022-01-08-15-37-31-image

Do you keep the same hyperparameters with mine?

I have updated README.md where checkpoint of twitter2017 is provided.
Other minor updates:
1. alter MISC type in twitter2017(./my_data/twitter2017) with OTHER to keep consistent with twitter2015
2. replace the broken images of twitter2017 with a specific image as descripted in our paper and update the link of twitter2017_img.tar.gz
3. add tqdm dependency in README.md:Install
4. add some insignificant changes to ddp_mmner.py

Jan 08 '22 10:01 TransformersWsz