Consistency while creating Train-Val-Test Split

Open Sohanpatnaik106 opened this issue 1 year ago • 0 comments

Hi,

While the train, validation, and the test split are being created, os.listdir() method in used in the dataset class. However, os.listdir() is a system dependent method, and it returns a different order of the files/folders present in the directory in different machines. This is leading to a change in the results reported in the paper. Can you please provide the preprocessed data split that you used for training, validation and testing so that the results could be reproduced, and a fair comparison can be made with your method?

An alternative could be first to sort the contents of os.listdir(), then by setting the random seed, shuffle the list, and then split in the way it is done currently. This will ensure consistency in the splits.

Thanks in advance.

Aug 06 '24 07:08 Sohanpatnaik106