DOLG icon indicating copy to clipboard operation
DOLG copied to clipboard

desc_top1_err and desc_top5_err are always 100 during training

Open JiahaoXia opened this issue 2 years ago • 4 comments

@feymanpriv Thanks for your great work! When I fine-tune the DOLG model on my customized dataset using your model weight, the desc_top1_err and desc_top5_err are always 100. I am not sure if there is any mistake on my side, you can find my attached config below:

MODEL:
  TYPE: resnet
  DEPTH: 101
  NUM_CLASSES: 3847
  HEADS:
    IN_FEAT: 2048
    REDUCTION_DIM: 512
    MARGIN: 0.15
    SCALE: 30
RESNET:
  TRANS_FUN: bottleneck_transform
  NUM_GROUPS: 1
  WIDTH_PER_GROUP: 64
  STRIDE_1X1: False
BN:
  ZERO_INIT_FINAL_GAMMA: True
OPTIM:
  BASE_LR: 0.01
  LR_POLICY: cos
  STEPS: [0, 30, 60, 90]
  LR_MULT: 0.1
  MAX_EPOCH: 100
  MOMENTUM: 0.9
  NESTEROV: True
  WEIGHT_DECAY: 0.0001
  WARMUP_EPOCHS: 5
TRAIN:
  DATASET: GSV_imgs_bldg_v1
  SPLIT: GSV_imgs_bldg_v1_train_stratify.txt
  BATCH_SIZE: 36
  IM_SIZE: 224
  EVAL_PERIOD: 100
TEST:
  DATASET: GSV_imgs_bldg_v1
  SPLIT: GSV_imgs_bldg_v1_val_stratify.txt
  BATCH_SIZE: 36
  IM_SIZE: 256
NUM_GPUS: 6
DATA_LOADER:
  NUM_WORKERS: 4 
CUDNN:
  BENCHMARK: True
OUT_DIR: ./GSV_imgs_bldg_v1_output

and the training command is:

python train.py --cfg configs/resnet101_delg_4gpu_GSV.yaml OUT_DIR ./GSV_imgs_bldg_v1_output NUM_GPUS 6 TRAIN.BATCH_SIZE 36 TEST.BATCH_SIZE 36 PORT 13005 TRAIN.WEIGHTS ./weights/r101_dolg_512.pyth

Could you please help me with it? I really appreciate it.

JiahaoXia avatar Nov 01 '22 03:11 JiahaoXia

Have you solved this problem??

chongaxiaopenzai avatar Nov 14 '22 08:11 chongaxiaopenzai

Increasing the batch size solves the problem somehow.

JiahaoXia avatar Nov 30 '22 23:11 JiahaoXia

Hi, I also have this question when I fine-tuned the DOLG model on my customized dataset using your model weight. I want to know whether have others solutions to solve the question?

Cindy-zhangtong avatar Mar 10 '23 07:03 Cindy-zhangtong

I too have the same problem. > me too > https://github.com/feymanpriv/DOLG/issues/9#issuecomment-1315157627 If I use the DOLG source as is, it will work.!! However, if everything is the same, bacbone (resnet101) is used as is, and the final feature map is extracted, GeM is applied. Afterwards, it is the same as the DOLG source. it has the same problem of not being able to learn the same thing.

Is there anyone who can solve it?

peternara avatar Nov 06 '23 05:11 peternara