Training speed significantly decreases after completing a stage in AI toolkit trainer
Body: The AI toolkit trainer works very fast initially. However, after training a stage model and starting the next stage, the training speed becomes extremely slow and significantly decreases. I have to restart the trainer to continue training and restore the speed. Labels: performance-issue, training-speed, stage-switch, restart-required
#504
same
check your vram, there is some kind of bug that is leaving something using the shared memory after saving a checkpoint. it will overload your vram (even with 10-20GB vram free) after saving a checkpoint and stay using the shared memory. This is causing the slowdown. It should properly release that.
check your vram, there is some kind of bug that is leaving something using the shared memory after saving a checkpoint. it will overload your vram (even with 10-20GB vram free) after saving a checkpoint and stay using the shared memory. This is causing the slowdown. It should properly release that.检查一下你的显存,有个 bug 在保存检查点后会留下共享内存。保存检查点后,即使有 10-20GB 的 VRAM,也会让你的内存过载,并且继续使用共享内存。这导致了卡顿。它应该能正确释放那个。
may I ask how to turn off the shared memory function?