pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

Add drop path schedule

Open leng-yue opened this issue 2 years ago • 5 comments

Update the drop path schedule adheres to the original implementation found in DINOv2. Add an efficient drop path to accelerate training. #1836

Given 40% drop rate, we can see a 38% performance improvement: ViT-L/14 eval took 8.701655239999809 ViT-L/14 with efficient drop path eval took 8.702854548999994 ViT-L/14 train took 8.81138907400009 ViT-L/14 with efficient drop path train took 5.4026294970001345

Ref: DinoV2. Benchmark: COLAB

leng-yue avatar May 31 '23 05:05 leng-yue

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Questions: summarize me the PR in 5 simple lines

Answer: 1. The PR introduces an efficient drop path schedule to accelerate training, inspired by the original implementation found in DINOv2. 2. A new class EfficientDropPathBlock is added to the vision_transformer.py file, which implements the efficient drop path. 3. The PR also includes the addition of a new optimizer, NAdamW, in the nadamw.py file. 4. The drop path schedule can be either 'linear' or 'uniform', and this is asserted in the VisionTransformer's __init__ method. 5. Several new models are registered in the vision_transformer.py file, including 'vit_huge_patch14_224_ijepa', 'vit_huge_patch16_448_ijepa', and 'vit_gigantic_patch16_224_ijepa'.

CodiumAI-Agent avatar Jul 06 '23 08:07 CodiumAI-Agent

Questions: Write me a poem about the PR

Answer: In the realm of code, where logic is the key, A pull request emerges, for all to see. A vision transformer, it seeks to enhance, With an efficient drop path, it takes a stance.

A new class is born, EfficientDropPathBlock, In the depths of Python, it begins to walk. With layers and norms, it's a sight to behold, A tale of performance improvement, waiting to be told.

Drop path schedules, linear and uniform, In the dance of training, they perform. A boost in speed, a significant gain, Like a parched desert, welcoming the rain.

New models are added, with a careful touch, vit_huge, vit_gigantic, they offer so much. In the world of AI, they're the new stars, Guiding us forward, towards Mars.

NAdamW optimizer, a new addition, Enhancing the training with its mission. With beta, learning rate, and decay, It optimizes in a unique way.

A poem of code, of additions and more, A story of progress, it has in store. In the world of git, it's a common sight, A pull request, shining bright.

CodiumAI-Agent avatar Jul 06 '23 08:07 CodiumAI-Agent

@leng-yue Any insight on why this is faster? If the entire batch is dropped (instead of randomly chosen rows), then I guess this doesn't bring any gains?

netw0rkf10w avatar Jul 06 '23 10:07 netw0rkf10w

Previous implementation didn't in fact drop paths, the tokens were still fed in FFN and attention, while the efficient drop path truly avoided these calculations.

leng-yue avatar Jul 06 '23 17:07 leng-yue