Trying to train the index with 2 GPUs instead of 8
I'm trying to adjust the learning rates for training with 2 GPUs by first trying the same learning rates used for 8 GPUs but also dividing all learning rates by 4 in proportion to only using 2 GPUs. After 30 epochs with the below settings for distilled-based indexing I can't get the model to train, the total training loss doesn't go below 11.3 for the 30 epochs (see example below with training settings)

python3 ./learnable_index/train_index.py --preprocess_dir ./data/passage/preprocess --embeddings_dir ./data/passage/evaluate/co-condenser --index_method ivf_opq --ivf_centers_num 10000 --subvector_num 32 --subvector_bits 8 --nprobe 100 --training_mode distill_index --per_device_train_batch_size 512
Hi, the loss usually is less than 1.0 in our reproductions. The result in your figure is strange, but I cannot find out what is wrong due to limited information. Can you provide a complete log for training?