EANN-KDD18 icon indicating copy to clipboard operation
EANN-KDD18 copied to clipboard

about your dataset

Open DeathYmz opened this issue 4 years ago • 10 comments

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

DeathYmz avatar Jul 30 '20 07:07 DeathYmz

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?

lt15523290043 avatar May 15 '21 12:05 lt15523290043

Okay, time has passed for a long time, and I forgot how to debug it.I can post some code that I have processed, I hope it will be useful to you     train_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/train_id.pickle", 'rb'))     val_id = pickle.load(open("../EANN-KDD18-master/Data/weibo/validate_id.pickle", 'rb'))

    stop_words = process_data_weibo.stopwordslist()      pre_path = 'F:/data/EANN-KDD18-master/Data/weibo/tweets/'     file_list = [pre_path + "test_nonrumor.txt", pre_path + "test_rumor.txt", 
                         pre_path + "train_nonrumor.txt", pre_path + "train_rumor.txt"]     nonrumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/')     rumor_images = deal_image('F:/data/EANN-KDD18-master/Data/weibo/rumor_images/')

    #train     for k,f in enumerate(file_list):         f = open(f,encoding='utf-8')         if (k + 1) % 2 == 1:             label = 0  ### real is 0         else:             label = 1  ####fake is 1         lines = f.readlines()         post_id = ""         url = ""         for i, line in enumerate(lines):             if (i+1)%3 ==1 :                 post_id = line.split('|')[0]             if (i+1)%3 ==2:                 url = (line.lower())             if (i+1)%3 ==0:                 line = process_data_weibo.clean_str_sst(line)                 seg_list = jieba.cut_for_search(line) #中文分词                 new_seg_list = []                 for word in seg_list:                     if word not in stop_words:                         new_seg_list.append(word)                 clean_l = ' '.join(new_seg_list)                 if len(clean_l) > 10 and post_id in train_id:                     describe = []                     for x in new_seg_list:                         if x not in word2ix:                             word2ix[x] = wordcnt                             ix2word[wordcnt] = x                             wordcnt += 1                         describe.append(word2ix[x])                     max_seq_len =max(max_seq_len,len(describe))                     event = int(train_id[post_id])                     max_event = max(max_event,event)                     for x in url.split('|'):                         image_id = x.split('/')[-1].split(".")[0]                         if label==0 and image_id in nonrumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id]                             data.append([describe,image_url,label,event])                         elif label==1 and image_id in rumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id]                             data.append([describe,image_url,label,event])

                             elif len(clean_l) > 10 and post_id in val_id:                     describe = []                     for x in new_seg_list:                         if x not in word2ix:                             word2ix[x] = wordcnt                             ix2word[wordcnt] = x                             wordcnt += 1                         describe.append(word2ix[x])                      max_seq_len =max(max_seq_len,len(describe))                      event = int(val_id[post_id])                     max_event = max(max_event,event)                     for x in url.split('|'):                         image_id = x.split('/')[-1].split(".")[0]                         if label==0 and image_id in nonrumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/nonrumor_images/' + image_id + '.' + nonrumor_images[image_id]                             val_data.append([describe,image_url,label,event])                         elif label==1 and image_id in rumor_images:                             image_url = 'F:/data/EANN-KDD18-master/Data/weibo/rumor_images/' + image_id + '.' + rumor_images[image_id]                             val_data.append([describe,image_url,label,event])

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月15日(星期六) 晚上8:30 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? hello,Do you know how to generate the files now?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

DeathYmz avatar May 15 '21 12:05 DeathYmz

Thank you very much for your reply. Could you tell me the relevant codes of this file of w2v.pickle

lt15523290043 avatar May 15 '21 13:05 lt15523290043

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

lt15523290043 avatar May 16 '21 02:05 lt15523290043

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each

def get_data(text_only):     #text_only = False

    if text_only:         print("Text only")         image_list = []     else:         print("Text and image")         image_list = read_image()

    train_data = write_data("train", image_list, text_only)     valiate_data = write_data("validate", image_list, text_only)     test_data = write_data("test", image_list, text_only)

    print("loading data...")     # w2v_file = '../Data/GoogleNews-vectors-negative300.bin'     vocab, all_text = load_data(train_data, valiate_data, test_data)     # print(str(len(all_text)))

    print("number of sentences: " + str(len(all_text)))     print("vocab size: " + str(len(vocab)))     max_l = len(max(all_text, key=len))     print("max sentence length: " + str(max_l))

    word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle"     w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes')     # print(w2v)     # input("w2v over")     print("word2vec loaded!")     print("num words already in word2vec: " + str(len(w2v)))          add_unknown_words(w2v, vocab)     W, word_idx_map = get_W(w2v)     # # rand_vecs = {}     # # add_unknown_words(rand_vecs, vocab)     W2 = rand_vecs = {}     w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb")     pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file)     w_file.close()

    return train_data, valiate_data, test_data

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11)

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

DeathYmz avatar May 16 '21 02:05 DeathYmz

The code on git is pretty good, you can take a closer look and you can understand the usefulness of each def get_data(text_only):     #text_only = False     if text_only:         print("Text only")         image_list = []     else:         print("Text and image")         image_list = read_image()     train_data = write_data("train", image_list, text_only)     valiate_data = write_data("validate", image_list, text_only)     test_data = write_data("test", image_list, text_only)     print("loading data...")     # w2v_file = '../Data/GoogleNews-vectors-negative300.bin'     vocab, all_text = load_data(train_data, valiate_data, test_data)     # print(str(len(all_text)))     print("number of sentences: " + str(len(all_text)))     print("vocab size: " + str(len(vocab)))     max_l = len(max(all_text, key=len))     print("max sentence length: " + str(max_l))     word_embedding_path = "../EANN-KDD18-master/Data/weibo/w2v.pickle"     w2v = pickle.load(open(word_embedding_path,"rb"),encoding='bytes')     # print(w2v)     # input("w2v over")     print("word2vec loaded!")     print("num words already in word2vec: " + str(len(w2v)))          add_unknown_words(w2v, vocab)     W, word_idx_map = get_W(w2v)     # # rand_vecs = {}     # # add_unknown_words(rand_vecs, vocab)     W2 = rand_vecs = {}     w_file = open("../EANN-KDD18-master/Data/weibo/word_embedding.pickle", "wb")     pickle.dump([W, W2, word_idx_map, vocab, max_l], w_file)     w_file.close()     return train_data, valiate_data, test_data ------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2021年5月16日(星期天) 上午10:08 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [yaqingwang/EANN-KDD18] about your dataset (#11) Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle? Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

I used Weibo files and it worked ,but I don't know how to use Twitter data.Please tell me whether you useTwitter files for your experiment?

lt15523290043 avatar May 24 '21 11:05 lt15523290043

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?Thank you!

youran521 avatar Dec 23 '21 01:12 youran521

Hello, can I ask how you generate validate_id.pickle/train_id.pickle/test_id.pickle?

Thank you very much for your reply,Could you tell me the relevant codes of this file of w2v.pickle ?It's very important to me. I'm a beginner. I'm sorry to bother you

hello,I have the same problem.could you please tell me how should I do?

Dxy-cpu avatar Jan 23 '22 06:01 Dxy-cpu

my email adress is [email protected]

Dxy-cpu avatar Jan 23 '22 06:01 Dxy-cpu

I have the same problem. Can you tell me how to solve it?

balabalacc avatar May 19 '23 07:05 balabalacc