Sel-CL
Sel-CL copied to clipboard
Training Efficiency
This is a good work. However, the training efficiency is somehow low~
In the training stage, the GPU utilization is about 50-60%. In the "pare-wise selection" stage, the GPU utilization is approximate 0.
At first, I think it is because the program executes the CPU operations in the "pare-wise selection" stage. I set up some checkpoints in the "pare-wise selection" stage, and find that most of the time was spent on "Weighted k-nn correction" (code link).
Looking forward to your suggestions for training efficiently~~~