FedML icon indicating copy to clipboard operation
FedML copied to clipboard

How to convert other type of data into fednlp type?

Open Luoyang144 opened this issue 2 years ago • 7 comments

Hello, I'm tryng to utilize fednlp to do some experiment which need to use other dataset as input, I tried to understand how to read example dataset but I don't know how to convert my dataset into required type. Can you give some tips or documents?

Luoyang144 avatar Jul 21 '22 10:07 Luoyang144

@Luoyang144 Hi! Thanks for asking. You can convert your custom dataset into h5 files by following fednlp/data/raw_data_loader/test/test_rawdataloader.py. Please check that file and then for making partitions please check advanced_parition folder. Please check if you can follow

MrigankRaman avatar Jul 22 '22 12:07 MrigankRaman

@zuluzazu Thanks for advice! I had check those files, but I didn't understand how to convert my dataset into uniform distribution while the example are advanced parition method. I had tried to read the parition dataset and in my view I need to restore n_client and tokenized sentence into parition file. Is this how you convert dataset? I'm not so familiar with federal learning, sorry to appear stupid and hope I didn't waste your time.

Luoyang144 avatar Jul 23 '22 07:07 Luoyang144

@Luoyang144 Could you convert the data file into h5 format? That is could you create yourdataset_data.h5 using raw_data_loader? If yes then you can create uniform using niid_label_skew.py. Uniform is nothing but label_skew with very very high alpha. Just keep alpha=10e5 and run niid_label_skew.py Please feel free to ask follow up questions.

MrigankRaman avatar Jul 23 '22 07:07 MrigankRaman

@zuluzazu Hello, thanks for your advice, I use Seq2SeqRawDataLoader class to generate h5 file with generate_h5_file function. And I guess you mean niid_label.py in this link instead of niid_label_skew.py? I am tring to generate parition_data.h5 with this code but in this file there are no task_type "sequence_to_sequence" which I need to run(sequence_to_sequence is the task type provided in niid_label.py). Can you give me further advice?

Luoyang144 avatar Jul 23 '22 09:07 Luoyang144

Hello, these days I tried to use kmeans to generate a parition_data file but when run it there are still error. Can you provide some advice?

Luoyang144 avatar Jul 29 '22 08:07 Luoyang144

@Luoyang144 Please tell us your data workflow and the error log, we will follow it asap.

chaoyanghe avatar Aug 19 '22 15:08 chaoyanghe

@Luoyang144 Can you please give it one more shot using the latest FedML version?

fedml-dimitris avatar Oct 25 '23 01:10 fedml-dimitris