ActionCLIP
ActionCLIP copied to clipboard
TinyCLIP integration for ActionCLIP
This PR integrates two TinyCLIP ViT models to the existing model framework with minimal changes. This is possible because TinyCLIP provides a pure ViT-based model, like CLIP. The TinyCLIP model is a CLIP distillation that provides significant speed-ups to the CLIP model while retaining and in some cases improving its zero-shot IN1K accuracy. A small state_dict conversion helper method and optional sha256 ignore flag are added to accommodate for this integration.
Graphs below show rough indication of ActionCLIP during train time on HMDB51 (no pre-train). Train step indicates the batches processed per minute (wall clock) time. TinyCLIP-based ActionCLIP model trains much faster while performance is almost similar to vanilla CLIP.