mdistiller
mdistiller copied to clipboard
The training loss
Thanks for your great work! When I run the code use:
python3 tools/train.py --cfg configs/imagenet/r34_r18/dot.yaml
The training loss is much larger than the kd method in the first few epochs, and the test acc is also low, is it normal?
The loss scale is too large. Did you change the batch-size or num-gpus?
@Zzzzz1 I use the original batch size 512 on 8 2080ti. After re-ran the code, I got the following results:
It seems still unstable and much worse than the vannila kd.
@Vickeyhw How long does it take you to run an epoch please, I find it very strange that it takes me 100 minutes to run a 1/4 Epoch on 8*3090.
@JinYu1998 23min/epoch.
@JinYu1998 23min/epoch.
Thanks for your response, I think I've identified the problem. Since my data is not on SSD, the io issue is causing slow training...