MultiBench icon indicating copy to clipboard operation
MultiBench copied to clipboard

Process mosei_senti_data.pkl to match the text id in mosei.hdf5

Open ZhuoZHI-UCL opened this issue 1 year ago • 0 comments

If you are using the mosei_senti_data.pkl and want to get the raw text by matching the id in mosei.hdf5, please consider to use the following script to process the data.

file1 = pickle.load(open('data/mosei_senti_data.pkl', 'rb'))

data = file1['test']['id']

# keep the first element and add the num.
modified_data = []
counters = {}
for element in tqdm(data, desc="Processing elements"):
    key = element[0]
    if key not in counters:
        counters[key] = 0
    modified_data.append(f"{key}[{counters[key]}]")
    counters[key] += 1


file1['test']['id'] = np.array(modified_data)


with open('data/mosei_new.pkl', 'wb') as f:
    pickle.dump(file1, f)

print('all done!')

ZhuoZHI-UCL avatar Jun 19 '24 14:06 ZhuoZHI-UCL