HIPT Confusion regarding the number of epochs the HIPT model was pre-trained for

Thank you for the great work and for sharing the code.

The paper mentions that the model was trained for 400K iterations with batch size of 256 which amounts to 102,400,000 patches, which seems to be about the same as the size of the dataset used for pretraining. So it seems like the model was trained for just 1 epoch, but the training script in the README

python -m torch.distributed.launch --nproc_per_node=8 main_dino.py --arch vit_small --data_path /path/to/TCGA_PRETRAINING_DIR/patch_256_pretraining/ --output_dir /path/to/TCGA_PRETRAINING_DIR/ckpts/pretrain/ --epochs 100

seems to suggest that it was pretrained for 100 epochs. Could you please clarify this detail? Thanks in advance.

Jul 15 '22 22:07 ramprs21

Hi @ramprs21

You are right that the model is essentially trained with "1 epoch". We reported training in terms of iterations to avoid confusion, but seems that reporting iterations can also be confusing!

An erratum we may also make to the arXiv is that in the 1st paragraph on page 6, the warm-up was reported in terms of epochs still (in reality - first 40,000 iterations were used).

The commands I provided in the README do not reflect the hyper-parameters used, and will make an update soon!

Jul 15 '22 23:07 Richarizardd

Thanks for the clarification @Richarizardd :)

Jul 15 '22 23:07 ramprs21

Hi Richard, Could you please let us know a little bit more about the training set up (# of GPUs and types) and how long it took to pre-train? Thanks

Jul 18 '22 18:07 ramprs21

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

Jul 21 '22 22:07 Richarizardd

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

Jul 21 '22 22:07 ramprs21

Hi @ramprs21 - Thank you for the note. I will reflect it soon in the README. Pretraining required 2-4x A100s (for batch size of 256), and took ~two weeks.

To comment on DINO, a great thing I have found about DINO is how data-efficient and generous it is w.r.t. low batch sizes (see ablation experiments on the last page), in contrast with other pretraining methods (SimCLR, MAE) that report results with batch sizes of 1024-2048. I imagine that even low batch sizes (and as CPATH images have less variation than natural images), DINO would perform well.

Dear @Richarizardd and @faisalml , first congratulations for this huge and very promising work. Making the code public is very appreciated. To clarify your previous comment @Richarizardd, was your effective batch size equal to 2x4x256=2048 (2 nodes x 4 GPUs x 256 images per gpu), or instead 2x4x32=256 (2 nodes x 4 GPUs x 64 images per gpu) ? In the latter case, the default value in the parser should be 32 and not 64... ?

Thanks @Richarizardd . That makes sense.

Just to clarify on your previous comment where you mentioned that the model was trained for just 1 epoch --

The default value for freeze_last_layer is 1 here, which means the last layer is frozen during the first epoch. Wondering if this should have been set to 0 instead?

Also, could you give us an update regarding @ramprs21 comment ? I guess you are not performing warming epochs neither as you are training on 1 epoch ? Thanks you very much ! Have a great day ☺️

Oct 11 '22 16:10 afilt

HIPT HIPT copied to clipboard

Confusion regarding the number of epochs the HIPT model was pre-trained for

HIPT
HIPT copied to clipboard