EANN-KDD18 about your dataset

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Jul 30 '20 07:07 DeathYmz

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?

May 15 '21 12:05 lt15523290043

Okay, time has passed for a long time, and I forgot how to debug it.I can post some code that I have processed, I hope it will be useful to you train_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/train_id.pickle", 'rb')) val_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/validate_id.pickle", 'rb'))

stop_words = process_data_weibo.stopwordslist() pre_path = 'F:/data/EANN-KDD18-master/Data/weibo/tweets/' file_list = [pre_path + "test_nonrumor.txt", pre_path + "test_rumor.txt",
pre_path + "train_nonrumor.txt", pre_path + "train_rumor.txt"] nonrumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/') rumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/rumor_images/')

#train for k,f in enumerate(file_list): f = open(f,encoding='utf-8') if (k + 1) % 2 == 1: label = 0 ### real is 0 else: label = 1 ####fake is 1 lines = f.readlines() post_id = "" url = "" for i, line in enumerate(lines): if (i+1)%3 ==1 : post_id = line.split('|')[0] if (i+1)%3 ==2: url = (line.lower()) if (i+1)%3 ==0: line = process_data_weibo.clean_str_sst(line) seg_list = jieba.cut_for_search(line) #中文分词 new_seg_list = [] for word in seg_list: if word not in stop_words: new_seg_list.append(word) clean_l = ' '.join(new_seg_list) if len(clean_l) > 10 and post_id in train_id: describe = [] for x in new_seg_list: if x not in word2ix: word2ix[x] = wordcnt ix2word[wordcnt] = x wordcnt += 1 describe.append(word2ix[x]) max_seq_len =max(max_seq_len,len(describe)) event = int(train_id[post_id]) max_event = max(max_event,event) for x in url.split('|'): image_id = x.split('/')[-1].split(".")[0] if label==0 and image_id in nonrumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id] data.append([describe,image_url,label,event]) elif label==1 and image_id in rumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id] data.append([describe,image_url,label,event])

elif len(clean_l) > 10 and post_id in val_id: describe = [] for x in new_seg_list: if x not in word2ix: word2ix[x] = wordcnt ix2word[wordcnt] = x wordcnt += 1 describe.append(word2ix[x]) max_seq_len =max(max_seq_len,len(describe)) event = int(val_id[post_id]) max_event = max(max_event,event) for x in url.split('|'): image_id = x.split('/')[-1].split(".")[0] if label==0 and image_id in nonrumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id] val_data.append([describe,image_url,label,event]) elif label==1 and image_id in rumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id] val_data.append([describe,image_url,label,event])

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月15日(星期六) 晚上8:30 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

May 15 '21 12:05 DeathYmz

Thank you very much for your reply. Could you tell me the relevant codes of this file of w2v.pickle

May 15 '21 13:05 lt15523290043

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

May 16 '21 02:05 lt15523290043

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each

def get_data(text_only): #text_only = False

if text_only: print("Text only") image_list = [] else: print("Text and image") image_list = read_image()

train_data = write_data("train", image_list, text_only) valiate_data = write_data("validate", image_list, text_only) test_data = write_data("test", image_list, text_only)

print("loading data...") # w2v_file = '../Data/GoogleNews-vectors-negative300.bin' vocab, all_text = load_data(train_data, valiate_data, test_data) # print(str(len(all_text)))

print("number of sentences: " + str(len(all_text))) print("vocab size: " + str(len(vocab))) max_l = len(max(all_text, key=len)) print("max sentence length: " + str(max_l))

word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle" w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes') # print(w2v) # input("w2v over") print("word2vec loaded!") print("num words already in word2vec: " + str(len(w2v))) add_unknown_words(w2v, vocab) W, word_idx_map = get_W(w2v) # # rand_vecs = {} # # add_unknown_words(rand_vecs, vocab) W2 = rand_vecs = {} w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb") pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file) w_file.close()

return train_data, valiate_data, test_data

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

May 16 '21 02:05 DeathYmz

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each def get_data(text_only): #text_only = False if text_only: print("Text only") image_list = [] else: print("Text and image") image_list = read_image() train_data = write_data("train", image_list, text_only) valiate_data = write_data("validate", image_list, text_only) test_data = write_data("test", image_list, text_only) print("loading data...") # w2v_file = '../Data/GoogleNews-vectors-negative300.bin' vocab, all_text = load_data(train_data, valiate_data, test_data) # print(str(len(all_text))) print("number of sentences: " + str(len(all_text))) print("vocab size: " + str(len(vocab))) max_l = len(max(all_text, key=len)) print("max sentence length: " + str(max_l)) word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle" w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes') # print(w2v) # input("w2v over") print("word2vec loaded!") print("num words already in word2vec: " + str(len(w2v))) add_unknown_words(w2v, vocab) W, word_idx_map = get_W(w2v) # # rand_vecs = {} # # add_unknown_words(rand_vecs, vocab) W2 = rand_vecs = {} w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb") pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file) w_file.close() return train_data, valiate_data, test_data … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

I used Weibo files and it worked ,but I don't know how to use Twitter data.Please tell me whether you useTwitter files for your experiment?

May 24 '21 11:05 lt15523290043

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?Thank you!

Dec 23 '21 01:12 youran521

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

hello，I have the same problem.could you please tell me how should I do?

Jan 23 '22 06:01 Dxy-cpu

my email adress is [email protected]

Jan 23 '22 06:01 Dxy-cpu

I have the same problem. Can you tell me how to solve it?

May 19 '23 07:05 balabalacc

EANN-KDD18 EANN-KDD18 copied to clipboard

about your dataset

EANN-KDD18
EANN-KDD18 copied to clipboard