dataset-distillation icon indicating copy to clipboard operation
dataset-distillation copied to clipboard

How to distribute different GPUs for some large models

Open liuyugeng opened this issue 2 years ago • 1 comments

Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that

Best, Yugeng

liuyugeng avatar Mar 22 '22 19:03 liuyugeng

Conceptually a couple strategies can be used:

  1. distributed different steps to different GPUs
  2. use gradient checkpointing to recompute early steps' graphs rather than storing them.

Neither is directly support by the provided code so they require additional efforts to implement.

ssnl avatar Mar 22 '22 20:03 ssnl