EEG-To-Text
EEG-To-Text copied to clipboard
Datset Preprocessing
Hello. As far as I understand, you are storing the data in a pandas dataframe with one column corressponding to EEG signals and the other to text and then converting EEG signals to text, correct? Could you elaborate more on how you've achieved this dataset format so that others can organize the dataset the same way?
Hi! sorry I am not sure what do you mean by pandas? But data preprocssing scripts can be found in scripts/prepare_dataset.sh
;
for example, the util/construct_dataset_mat_to_pickle_v1.py
will convert the ZuCo v1.0 .mat file into a .pickle file, which is like a python dictionary.
Pandas is a data analysis library in python used to build dataframes. I was actually asking for instructions on how to build the dataset in the format where one column corressponds to EEG signals and another one to text so that I can create seq2seq models that take EEG as input and generate text
Actually, I figured it out! After creating train_set and dev_set, I just used this snippet of code:
import pandas as pd
def dataset_to_dataframe(dataset):
# Initialize lists to hold data
input_embeddings_list = []
seq_len_list = []
input_attn_mask_list = []
input_attn_mask_invert_list = []
target_strings_list = []
sent_level_EEG_list = []
# Iterate through the dataset
for i in range(len(dataset)):
input_embeddings, seq_len, input_attn_mask, input_attn_mask_invert, target_string, sent_level_EEG = dataset[i]
# Convert tensors to numpy arrays
input_embeddings_np = input_embeddings.numpy()
input_attn_mask_np = input_attn_mask.numpy()
input_attn_mask_invert_np = input_attn_mask_invert.numpy()
sent_level_EEG_np = sent_level_EEG.numpy()
# Append to lists
input_embeddings_list.append(input_embeddings_np)
seq_len_list.append(seq_len)
input_attn_mask_list.append(input_attn_mask_np)
input_attn_mask_invert_list.append(input_attn_mask_invert_np)
target_strings_list.append(target_string)
sent_level_EEG_list.append(sent_level_EEG_np)
# Create DataFrame
df = pd.DataFrame({
'Input Embeddings': input_embeddings_list,
'Sequence Length': seq_len_list,
'Input Attention Mask': input_attn_mask_list,
'Input Attention Mask Invert': input_attn_mask_invert_list,
'Target String': target_strings_list,
'Sentence Level EEG': sent_level_EEG_list
})
return df
# Convert datasets to dataframes
train_df = dataset_to_dataframe(train_set)
dev_df = dataset_to_dataframe(dev_set)