dataset-distillation How to distribute different GPUs for some large models

How to distribute different GPUs for some large models

Open liuyugeng opened this issue 2 years ago • 1 comments

Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that

Best, Yugeng

Mar 22 '22 19:03 liuyugeng

Conceptually a couple strategies can be used:

distributed different steps to different GPUs
use gradient checkpointing to recompute early steps' graphs rather than storing them.

Neither is directly support by the provided code so they require additional efforts to implement.

Mar 22 '22 20:03 ssnl

dataset-distillation dataset-distillation copied to clipboard

How to distribute different GPUs for some large models

dataset-distillation
dataset-distillation copied to clipboard