mmpretrain Unbalanced Samples

推荐使用英语模板 General question，以便你的问题帮助更多人。

首先确认以下内容

我已经查询了相关的 issue，但没有找到需要的帮助。
我已经阅读了相关文档，但仍不知道如何解决。

描述你遇到的问题

[填写这里]

相关信息

pip list | grep "mmcv\|mmcls\|^torch" 命令的输出 [填写这里]
如果你修改了，或者使用了新的配置文件，请在这里写明

[填写这里]

如果你是在训练过程中遇到的问题，请填写完整的训练日志和报错信息 [填写这里]
如果你对 mmcls 文件夹下的代码做了其他相关的修改，请在这里写明 [填写这里] In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

Aug 16 '22 02:08 wumuyu9

In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

In your description, the image_list in the dataset has been changed during the training, the data_load should be rebuilt. Can you explain why and how you divided train.txt？

Aug 16 '22 02:08 Ezra-Yu

In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

In your description, the image_list in the dataset has been changed during the training, the data_load should be rebuilt. Can you explain why and how you divided train.txt？

Yes,the dataloader need to be rebuilt. Because the dataset has more than ten categories, the number of one category is more than ten times that of other categories. I hope the cnn could be trained by the data of different categories to in every batch. However, I don't have enough gpu to use many samples in one batch. At the same, I hope the cnn could be trained by as many samples as possible in the category with the largest number.

Therefore, I choose a compromise by dividing the data of category with the largest number to several folders. I hope the cnn could be trained by the different data with the largest number of category and the same data in other categories during training in different epoch. At present, I make different training files with writing different data path with the largest number of category and the same data path in other categories.

Aug 16 '22 03:08 wumuyu9

You can try ClassBalancedDataset.

If you really want to do what you describe, you may need to implement a hook, refer to the doc

Aug 16 '22 07:08 Ezra-Yu

This issue will be closed as it is inactive, feel free to re-open it if necessary.

Dec 12 '22 15:12 tonysy