LaMP icon indicating copy to clipboard operation
LaMP copied to clipboard

some questions about own dataset

Open check-777 opened this issue 2 years ago • 2 comments

Thanks for your nice works!If I want to use another datasets from "mulan",such as emotion ,how can i convert it to datasets paper used ?I saw the words in the top of the preprocess.py "../data/reuters/train_inputs.txt -train_tgt ../data/reuters/train_labels.txt”, but i don't know the detail of the train_inputs.txt or train_labels.txt.Can you give an example of such type.Thank!

check-777 avatar Jul 09 '21 03:07 check-777

I have the same problem. Have you solved it?

untiltheday-lin avatar Oct 11 '22 07:10 untiltheday-lin

I don't have access to the raw data anymore as my home directory was removed from the university servers. I believe preprocess.py expects a train_input.txt file where each line is a sample and a train_label.txt file where each line contains the sequence of labels.

Alternatively, you can write your own prepocessing file to match the .pt pytorch object files that are currently used.

jacklanchantin avatar Oct 11 '22 13:10 jacklanchantin