dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Simple questions about dataset preparation before training DINOV2

Open EddieAy opened this issue 1 year ago • 4 comments

@woctezuma Sorry to bother you again. But I just wanna know how can i get correct ImageNet-like dataset

  • For example,when i train dinov1, it's simple and it just uses a ImageFolder

  • Below picture is the dataset preparation process in DINOV1 image

  • And this is my dataset structure. I don't need any other adjustment to this dataset when i try to train DINOV1. Because this is what ImageFolder likes image

  • BUT NOW, when it comes to dinov2,it shows that: image

-So do i need an <ROOT>/labels.txt? Like below and i don't have TEST SET image

-Although i don't know what are those files under 'extra', when i follow the below code image

  • I got this. So how can i correct? How i get start to train dinov2? image

  • And this is my dataset structure

image

Thank you @woctezuma

EddieAy avatar May 23 '23 16:05 EddieAy

+1, it would be better if we can just uses a ImageFolder

XiaohuJoshua avatar May 25 '23 07:05 XiaohuJoshua

@EddieAy @XiaohuJoshua Did you solve the problem?

onvungocminh avatar Jun 27 '23 09:06 onvungocminh

I have the same doubt.

HDL-YD avatar Oct 16 '23 07:10 HDL-YD

There are several ways how you can train linear head for classification on your data:

  1. Downstream-Dino-V2 repository
  2. HF Transformers version of DINOv2 with the example notebook for the classification
  3. Modify this repository a little bit.

I will advocate for the last approach because it contains the training of many linear heads simultaneously with different parameters (learning rate, average pooling or not and different blocks from the backbone). I will do for the imagenette dataset, but you, of course, can use whatever you want.

  • Create a file with torchvision dataset as usual here dinov2/data/datasets
  • Additionally implement get_targets method, which should return the number of classes in your dataset. This method is used here: https://github.com/facebookresearch/dinov2/blob/2302b6bf46953431b969155307b9bed152754069/dinov2/eval/linear.py#L499
  • Here is the full source code for reference.
  • Add the following line in dinov2/data/datasets/__init__.py file:
from .imagenette import Imagenette
  • Also import your dataset here https://github.com/facebookresearch/dinov2/blob/2302b6bf46953431b969155307b9bed152754069/dinov2/data/loaders.py#L13
  • Add download key (or any other which you are using) to the list of available keys here https://github.com/facebookresearch/dinov2/blob/2302b6bf46953431b969155307b9bed152754069/dinov2/data/loaders.py#L52
  • Add your dataset here https://github.com/facebookresearch/dinov2/blob/2302b6bf46953431b969155307b9bed152754069/dinov2/data/loaders.py#L55-L60
elif name == "imagenette":
    class_ = Imagenette
  • Launch training (don't forget to update paths or any other parameters):
CUDA_VISIBLE_DEVICES='0' python ./dinov2/eval/linear.py --config-file ./dinov2/configs/eval/vits14_pretrain.yaml --pretrained-weights ./weights/dinov2_vits14_pretrain.pth --output-dir results/dinov2_vits14/without_rtokens/linear/imagenette --train-dataset imagenette:split=train:root=./data/imagenette:download=True --val-dataset imagenette:split=val:root=./data/imagenette --batch-size 128

Feel free to contact me if you have any issues.

bruce-willis avatar Feb 22 '24 16:02 bruce-willis