ISC21-Descriptor-Track-1st icon indicating copy to clipboard operation
ISC21-Descriptor-Track-1st copied to clipboard

Final optimizer state for the model

Open shubhamjain0594 opened this issue 2 years ago • 11 comments

Hello @lyakaap

Thanks a lot for this work. I am trying to take this and finetune over a certain task. Is it possible you can provide the state of final optimizer after 4th stage of training. We want to try an experiment where it will be very useful.

Thank you.

shubhamjain0594 avatar May 05 '22 11:05 shubhamjain0594

Thanks for your interest to my work.

You can download the checkpoint including optimizer state here: https://drive.google.com/file/d/1Z9G2yhYep0woJuKitaLJ2W06WUHWxbAv/view?usp=sharing

lyakaap avatar May 07 '22 05:05 lyakaap

Thank you for your reply @lyakaap

With fourth stage I meant the final training phase before applying post-processing (the fourth stage as described in your paper). I believe that should be v107_0009.pth.tar

It will be great if you can share that. Thank you.

shubhamjain0594 avatar May 11 '22 12:05 shubhamjain0594

Then, this will do :) https://drive.google.com/file/d/1ySea-NJp_J0aWvma_WmVbc3Hnwf5LHUf/view

lyakaap avatar May 11 '22 12:05 lyakaap

Thank you for this. It has been very useful (both the files).

Lastly, what command and what value of gem-eval-p do you use to do intermediate evaluation? I am trying to replicate the results but I get scores of 0.72 with final model while you have reported 0.755 in your paper.

shubhamjain0594 avatar May 17 '22 09:05 shubhamjain0594

That's weird. It should match the performance if you execute inference code as described README. Please make sure that you evaluate with the private-set of phase-1.

lyakaap avatar May 17 '22 10:05 lyakaap

Okay, I found the bug in my evaluation code. Thanks for your help.

Can you also provide the final model after stage 2, i.e. final model for v86.

shubhamjain0594 avatar May 17 '22 11:05 shubhamjain0594

Sorry, it seems that I have deleeted the weights of the models prior to stage2...

lyakaap avatar Nov 28 '22 01:11 lyakaap

Then, this will do :) https://drive.google.com/file/d/1ySea-NJp_J0aWvma_WmVbc3Hnwf5LHUf/view

Hey Iyakaap, thank you for your work.

I'd like to reproduce the fourth stage based on this output from stage3, but i met some problem. Do you still remember what's the batch_size, num_negatives, learning_rate, and how many GPU you have used to train the fourth stage?

GorillaSX avatar Dec 15 '22 01:12 GorillaSX

@GorillaSX You can check this branch for reproducing our results: https://github.com/lyakaap/ISC21-Descriptor-Track-1st/tree/reproduce

I think you can reproduce by following:

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 --batch-size 16 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

lyakaap avatar Dec 15 '22 01:12 lyakaap

@GorillaSX You can check this branch for reproducing our results: https://github.com/lyakaap/ISC21-Descriptor-Track-1st/tree/reproduce

I think you can reproduce by following:

python v107.py \
  -a tf_efficientnetv2_m_in21ft1k --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 --seed 99999 \
  --epochs 10 --lr 0.5 --wd 1e-6 --batch-size 16 --ncrops 2 \
  --gem-p 1.0 --pos-margin 0.0 --neg-margin 1.1 --weight ./v98/train/checkpoint_0001.pth.tar \
  --input-size 512 --sample-size 1000000 --memory-size 1000 \
  ../input/training_images/

I appreciate it @lyakaap.

I have tried this, but it seems i cannot put 16 batchs each with 30 neg samples into a single GPU. Would you mind tell me if you remember how many GPUs you used or what's the number of neg samples?

GorillaSX avatar Dec 16 '22 09:12 GorillaSX

I remembered that I used 16 A100 GPUS.

lyakaap avatar Dec 16 '22 13:12 lyakaap