scGPT
scGPT copied to clipboard
mre is increasing while mse is decreasing
Hi, thank you for such wondorful work!
I am trying to pretrain scGPT for in a small dataset and I am using the pipeline in the dev-temp branch (I merged it with the main branch). After solving the issues related to library version\flash-attn I finally make the pretrain.py works! But I found the train loss is a little bit strange.
This is part of the training log:
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 186 | time: 8.76s | valid loss/mse 157.0726 | mre 1.4009
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - Saving the best model to ./save/eval-Mar25-14-06-2024
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 187 | time: 8.13s | valid loss/mse 158.1381 | mre 1.4314
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 188 | time: 5.81s | valid loss/mse 157.6718 | mre 1.3989
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 189 | time: 6.14s | valid loss/mse 158.9929 | mre 1.4236
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 190 | time: 8.69s | valid loss/mse 158.0198 | mre 1.4282
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 191 | time: 8.16s | valid loss/mse 158.5909 | mre 1.4189
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 192 | time: 9.12s | valid loss/mse 158.4677 | mre 1.4159
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - | end of epoch 193 | time: 8.12s | valid loss/mse 158.7186 | mre 1.4422
scGPT - INFO - -----------------------------------------------------------------------------------------
scGPT - INFO - -----------------------------------------------------------------------------------------
You can see the valid loss is pretty large and the mre is increasing.
My training command is:
DATASET="path to dataset"
LOG_INTERVAL=100
VALID_SIZE_OR_RATIO=0.1
MAX_LENGTH=1200
per_proc_batch_size=64
LAYERS=4
MODEL_SCALE=1
python ./examples/pretrain.py \
--data-source $DATASET \
--save-dir ./save/eval-$(date +%b%d-%H-%M-%Y) \
--max-seq-len $MAX_LENGTH \
--batch-size $per_proc_batch_size \
--eval-batch-size $(($per_proc_batch_size * 2)) \
--epochs 10000 \
--log-interval $LOG_INTERVAL --save-interval 10000 \
--no-cls \
--no-cce \
--fp16 \
--vocab-path "path to vocab.json" \
--nlayers 2 --nheads 2 --embsize 32 --d-hid 32
I was wondering how normal train looks like and any help are welcome!
@subercui Hi, may I ask the details about your train curve? I was wondering the train log above is correct or not.