EANN-KDD18
EANN-KDD18 copied to clipboard
about your dataset
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?
Okay, time has passed for a long time, and I forgot how to debug it.I can post some code that I have processed, I hope it will be useful to you train_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/train_id.pickle", 'rb')) val_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/validate_id.pickle", 'rb'))
stop_words = process_data_weibo.stopwordslist()
pre_path = 'F:/data/EANN-KDD18-master/Data/weibo/tweets/'
file_list = [pre_path + "test_nonrumor.txt", pre_path + "test_rumor.txt",
pre_path + "train_nonrumor.txt", pre_path + "train_rumor.txt"]
nonrumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/')
rumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/rumor_images/')
#train for k,f in enumerate(file_list): f = open(f,encoding='utf-8') if (k + 1) % 2 == 1: label = 0 ### real is 0 else: label = 1 ####fake is 1 lines = f.readlines() post_id = "" url = "" for i, line in enumerate(lines): if (i+1)%3 ==1 : post_id = line.split('|')[0] if (i+1)%3 ==2: url = (line.lower()) if (i+1)%3 ==0: line = process_data_weibo.clean_str_sst(line) seg_list = jieba.cut_for_search(line) #中文分词 new_seg_list = [] for word in seg_list: if word not in stop_words: new_seg_list.append(word) clean_l = ' '.join(new_seg_list) if len(clean_l) > 10 and post_id in train_id: describe = [] for x in new_seg_list: if x not in word2ix: word2ix[x] = wordcnt ix2word[wordcnt] = x wordcnt += 1 describe.append(word2ix[x]) max_seq_len =max(max_seq_len,len(describe)) event = int(train_id[post_id]) max_event = max(max_event,event) for x in url.split('|'): image_id = x.split('/')[-1].split(".")[0] if label==0 and image_id in nonrumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id] data.append([describe,image_url,label,event]) elif label==1 and image_id in rumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id] data.append([describe,image_url,label,event])
elif len(clean_l) > 10 and post_id in val_id: describe = [] for x in new_seg_list: if x not in word2ix: word2ix[x] = wordcnt ix2word[wordcnt] = x wordcnt += 1 describe.append(word2ix[x]) max_seq_len =max(max_seq_len,len(describe)) event = int(val_id[post_id]) max_event = max(max_event,event) for x in url.split('|'): image_id = x.split('/')[-1].split(".")[0] if label==0 and image_id in nonrumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id] val_data.append([describe,image_url,label,event]) elif label==1 and image_id in rumor_images: image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id] val_data.append([describe,image_url,label,event])
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月15日(星期六) 晚上8:30 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thank you very much for your reply. Could you tell me the relevant codes of this file of w2v.pickle
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?
Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you
The code on git is pretty good, you can take a closer look and you can understand the usefulness of each
def get_data(text_only): #text_only = False
if text_only: print("Text only") image_list = [] else: print("Text and image") image_list = read_image()
train_data = write_data("train", image_list, text_only) valiate_data = write_data("validate", image_list, text_only) test_data = write_data("test", image_list, text_only)
print("loading data...") # w2v_file = '../Data/GoogleNews-vectors-negative300.bin' vocab, all_text = load_data(train_data, valiate_data, test_data) # print(str(len(all_text)))
print("number of sentences: " + str(len(all_text))) print("vocab size: " + str(len(vocab))) max_l = len(max(all_text, key=len)) print("max sentence length: " + str(max_l))
word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle" w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes') # print(w2v) # input("w2v over") print("word2vec loaded!") print("num words already in word2vec: " + str(len(w2v))) add_unknown_words(w2v, vocab) W, word_idx_map = get_W(w2v) # # rand_vecs = {} # # add_unknown_words(rand_vecs, vocab) W2 = rand_vecs = {} w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb") pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file) w_file.close()
return train_data, valiate_data, test_data
------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?
Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
The code on git is pretty good, you can take a closer look and you can understand the usefulness of each def get_data(text_only): #text_only = False if text_only: print("Text only") image_list = [] else: print("Text and image") image_list = read_image() train_data = write_data("train", image_list, text_only) valiate_data = write_data("validate", image_list, text_only) test_data = write_data("test", image_list, text_only) print("loading data...") # w2v_file = '../Data/GoogleNews-vectors-negative300.bin' vocab, all_text = load_data(train_data, valiate_data, test_data) # print(str(len(all_text))) print("number of sentences: " + str(len(all_text))) print("vocab size: " + str(len(vocab))) max_l = len(max(all_text, key=len)) print("max sentence length: " + str(max_l)) word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle" w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes') # print(w2v) # input("w2v over") print("word2vec loaded!") print("num words already in word2vec: " + str(len(w2v))) add_unknown_words(w2v, vocab) W, word_idx_map = get_W(w2v) # # rand_vecs = {} # # add_unknown_words(rand_vecs, vocab) W2 = rand_vecs = {} w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb") pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file) w_file.close() return train_data, valiate_data, test_data … ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
I used Weibo files and it worked ,but I don't know how to use Twitter data.Please tell me whether you useTwitter files for your experiment?
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?Thank you!
Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?
Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you
hello,I have the same problem.could you please tell me how should I do?
I have the same problem. Can you tell me how to solve it?