Failed to reproduce the results
Thank you for your released codes. I tried to follow your instruction to train the model but I cannot reproduce your reported results in the paper. The F1 score on Twitter 15 and Twitter 17 datasets I got is shown as follows:

I have tried to run the do_test command but the F1 score does not change significantly. Moreover, I have run the experiment over 5 times on each dataset but didn't find a significant difference in the score. Do you have any suggestions?
Thank you for your questions.
- I conduct experiments on such hardware conditions:
maybe the difference of hardware lead to the gap of results? - Have you tried the checkpoint of twitter2015 I provided to evaluate? I checked it again just now and get the following result:

My CUDA version and driver version is identical to yours. I tried your checkpoint and the score is identical to yours (by the way, the model is saved in cuda:3. It is not good). However, I have no idea how to train such a model. Moreover, could you provide a model for twitter 17?
- I have retrained the model of twitter2017 following the instructions of
README.mdand get the following result:

Do you keep the same hyperparameters with mine?
-
I have updated
README.mdwhere checkpoint of twitter2017 is provided. -
Other minor updates:
-
alter
MISCtype in twitter2017(./my_data/twitter2017) withOTHERto keep consistent with twitter2015 -
replace the broken images of twitter2017 with a specific image as descripted in our paper and update the link of
twitter2017_img.tar.gz -
add
tqdmdependency inREADME.md:Install -
add some insignificant changes to
ddp_mmner.py
-