dataset-distillation
dataset-distillation copied to clipboard
How to distribute different GPUs for some large models
Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that
Best, Yugeng
Conceptually a couple strategies can be used:
- distributed different steps to different GPUs
- use gradient checkpointing to recompute early steps' graphs rather than storing them.
Neither is directly support by the provided code so they require additional efforts to implement.