BDANN-IJCNN2020 icon indicating copy to clipboard operation
BDANN-IJCNN2020 copied to clipboard

About the twitter dataset

Open gehong-coder opened this issue 2 years ago β€’ 9 comments

About the twitter dataset The question of the true and false numbers, that is, how did 7021 5974 come about and why did we reproduce a different result?

gehong-coder avatar Jun 12 '22 15:06 gehong-coder

We only consider the post that contains the corresponding image.

xiaolan98 avatar Jun 13 '22 01:06 xiaolan98

Hello, author, If you follow EANN, why is the number of this Twitter data set inconsistent with EANN? You wrote 7021 5974, while EANN wrote 7898 6026, I am very confused. I always have doubts about this data set 😭, I hope you can help me, thank you for your constant reply😊😊 (^Ο‰^).

gehong-coder avatar Jun 26 '22 06:06 gehong-coder

Hi, I follow the EANN setting, but they didn't publicize the code of the Twitter dataset. So I write the statistics based on my reproduced results. Yes, this is actually a very messy dataset. It also contains many texts that are not in English. I also translate them with google translate API. The results are in the 'cleaned_train_text.pkl' and 'cleaned_test_text.pkl' file. You can just use them.

xiaolan98 avatar Jun 27 '22 04:06 xiaolan98

Hello author, I saw the data you cleaned cleaned_train_text.pkl, its number is 15630 cleaned_test_text.pkl, its number is 2229 image_test:104 photos image_train:363 photos image_val:50 Filtering the data with images according to the images results in these.

  1. If validate is test. train datasetLabel number is 11181 Rummor number is 6306 Non rummor is 4875 validate dataset Label number is 1492 Rummor number is 492 Non rummor is 1000 validate dataset Label number is 1492 Rummor number is 492 Non rummor is 1000 Cumulative: rumor=6306+492 = 6798 non_rummor = 4875+1000 = 5875 Is this number correct? Or does it mean that rumor=6306+492+492 = 7290 non_roumor = 4875+1000+1000 = 6875 but the quantities add up to a different number.
  2. If test is test: train dataset Label number is 11181 Rummor number is 6306 Non rummor is 4875 validate dataset Label number is 1492 Rummor number is 492 Non rummor is 1000 test dataset Label number is 1104 Rummor number is 634 Non rummor is 470 Cumulative: rumor=6306+492 +634= 7432 non_roumor=4875+1000+470 = 6315 The image-tests are not used? Why doesn't it add up to roumor:7021 and real:5974?Is the number of experimental data as you wrote it?can I follow your number statistics? appreciate you and thank you!

gehong-coder avatar Jun 27 '22 05:06 gehong-coder

Hi, now I cannot find the original code to calculate the statistics in the paper comes from. I can find it tomorrow on another computer. You can just use the statistic getting from the code depending on the setting (validation is test or test is test.)

xiaolan98 avatar Jun 27 '22 06:06 xiaolan98

The code is written quite a long time ago. It's a little messy πŸ˜‘. I'll try to find the original code, double-check the statistics and update it with you as soon as possible.

xiaolan98 avatar Jun 27 '22 06:06 xiaolan98

You are really responsible and thank you very much for your answer.

gehong-coder avatar Jun 27 '22 06:06 gehong-coder

No worries. It's my duty.

xiaolan98 avatar Jun 27 '22 06:06 xiaolan98

Hi, I checked my previous code. But I cannot find the original code to get the statistic in the paper. You can just use the statistic getting from the training process, depending on 'validate is test' or 'test is test'.

xiaolan98 avatar Jul 07 '22 02:07 xiaolan98