LibVQ icon indicating copy to clipboard operation
LibVQ copied to clipboard

Trying to train the index with 2 GPUs instead of 8

Open jamesoneill12 opened this issue 3 years ago • 1 comments

I'm trying to adjust the learning rates for training with 2 GPUs by first trying the same learning rates used for 8 GPUs but also dividing all learning rates by 4 in proportion to only using 2 GPUs. After 30 epochs with the below settings for distilled-based indexing I can't get the model to train, the total training loss doesn't go below 11.3 for the 30 epochs (see example below with training settings)

image

python3 ./learnable_index/train_index.py --preprocess_dir ./data/passage/preprocess --embeddings_dir ./data/passage/evaluate/co-condenser --index_method ivf_opq --ivf_centers_num 10000 --subvector_num 32 --subvector_bits 8 --nprobe 100 --training_mode distill_index --per_device_train_batch_size 512

jamesoneill12 avatar Dec 15 '22 20:12 jamesoneill12

Hi, the loss usually is less than 1.0 in our reproductions. The result in your figure is strange, but I cannot find out what is wrong due to limited information. Can you provide a complete log for training?

staoxiao avatar Dec 18 '22 10:12 staoxiao