FedML
FedML copied to clipboard
How to convert other type of data into fednlp type?
Hello, I'm tryng to utilize fednlp to do some experiment which need to use other dataset as input, I tried to understand how to read example dataset but I don't know how to convert my dataset into required type. Can you give some tips or documents?
@Luoyang144 Hi! Thanks for asking. You can convert your custom dataset into h5 files by following fednlp/data/raw_data_loader/test/test_rawdataloader.py. Please check that file and then for making partitions please check advanced_parition folder. Please check if you can follow
@zuluzazu Thanks for advice! I had check those files, but I didn't understand how to convert my dataset into uniform distribution while the example are advanced parition method. I had tried to read the parition dataset and in my view I need to restore n_client and tokenized sentence into parition file. Is this how you convert dataset? I'm not so familiar with federal learning, sorry to appear stupid and hope I didn't waste your time.
@Luoyang144 Could you convert the data file into h5 format? That is could you create yourdataset_data.h5 using raw_data_loader? If yes then you can create uniform using niid_label_skew.py. Uniform is nothing but label_skew with very very high alpha. Just keep alpha=10e5 and run niid_label_skew.py Please feel free to ask follow up questions.
@zuluzazu Hello, thanks for your advice, I use Seq2SeqRawDataLoader
class to generate h5 file with generate_h5_file
function. And I guess you mean niid_label.py in this link instead of niid_label_skew.py?
I am tring to generate parition_data.h5 with this code but in this file there are no task_type "sequence_to_sequence" which I need to run(sequence_to_sequence is the task type provided in niid_label.py).
Can you give me further advice?
Hello, these days I tried to use kmeans to generate a parition_data file but when run it there are still error. Can you provide some advice?
@Luoyang144 Please tell us your data workflow and the error log, we will follow it asap.
@Luoyang144 Can you please give it one more shot using the latest FedML version?