mmpretrain icon indicating copy to clipboard operation
mmpretrain copied to clipboard

Unbalanced Samples

Open wumuyu9 opened this issue 3 years ago • 3 comments

推荐使用英语模板 General question,以便你的问题帮助更多人。

首先确认以下内容

  • 我已经查询了相关的 issue,但没有找到需要的帮助。
  • 我已经阅读了相关文档,但仍不知道如何解决。

描述你遇到的问题

[填写这里]

相关信息

  1. pip list | grep "mmcv\|mmcls\|^torch" 命令的输出 [填写这里]
  2. 如果你修改了,或者使用了新的配置文件,请在这里写明
[填写这里]
  1. 如果你是在训练过程中遇到的问题,请填写完整的训练日志和报错信息 [填写这里]
  2. 如果你对 mmcls 文件夹下的代码做了其他相关的修改,请在这里写明 [填写这里] In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

wumuyu9 avatar Aug 16 '22 02:08 wumuyu9

In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

In your description, the image_list in the dataset has been changed during the training, the data_load should be rebuilt. Can you explain why and how you divided train.txt?

Ezra-Yu avatar Aug 16 '22 02:08 Ezra-Yu

In my dataset, there is a very large number of categories. Therefore, I divided the data of this category into several folders. I need to change the 'train.txt' to use different folders during training in different epoch. What should I do to reload the data during training by epoch?

In your description, the image_list in the dataset has been changed during the training, the data_load should be rebuilt. Can you explain why and how you divided train.txt?

Yes,the dataloader need to be rebuilt. Because the dataset has more than ten categories, the number of one category is more than ten times that of other categories. I hope the cnn could be trained by the data of different categories to in every batch. However, I don't have enough gpu to use many samples in one batch. At the same, I hope the cnn could be trained by as many samples as possible in the category with the largest number.

Therefore, I choose a compromise by dividing the data of category with the largest number to several folders. I hope the cnn could be trained by the different data with the largest number of category and the same data in other categories during training in different epoch. At present, I make different training files with writing different data path with the largest number of category and the same data path in other categories.

wumuyu9 avatar Aug 16 '22 03:08 wumuyu9

You can try ClassBalancedDataset.

If you really want to do what you describe, you may need to implement a hook, refer to the doc

Ezra-Yu avatar Aug 16 '22 07:08 Ezra-Yu

This issue will be closed as it is inactive, feel free to re-open it if necessary.

tonysy avatar Dec 12 '22 15:12 tonysy