UFold icon indicating copy to clipboard operation
UFold copied to clipboard

How can I generate the training datasets?

Open llfzllfz opened this issue 2 years ago • 1 comments

I've download the RNAStralign from the mxfold2, and it has 8 subfolders. With your code in process_data_newdataset.py, I just find the os.listdir(), and it can't solve the subfolders. So what should I do to generate the training datasets? Thanks.

llfzllfz avatar Sep 19 '22 10:09 llfzllfz

Hi there,

It depends on how you would like to deal with these data. In our work, we merged all these files in the RNAStralign dataset into one folder and use all the dataset for training. If you choose to check the performance on various species, you may need to use these separated subfolders as illustrated in e2efold paper. So all in all, it depends on how you would like to operate.

Thanks.

sperfu avatar Sep 21 '22 02:09 sperfu