DNABERT icon indicating copy to clipboard operation
DNABERT copied to clipboard

Finetuning Issue with Example Data

Open mosala777 opened this issue 2 years ago • 4 comments

Hello,

I tried finetuning the 6-mer model with the provided example data. Performance seems to be good at first, but then it suddenly drops significantly raising a warning. First few evaluations start normally and improve like this:

05/20/2022 19:32:46 - INFO - main - ***** Eval results ***** 05/20/2022 19:32:46 - INFO - main - acc = 0.944 05/20/2022 19:32:46 - INFO - main - auc = 0.988568 05/20/2022 19:32:46 - INFO - main - f1 = 0.9439997759991039 05/20/2022 19:32:46 - INFO - main - mcc = 0.8880071040852491 05/20/2022 19:32:46 - INFO - main - precision = 0.9440071041136658 05/20/2022 19:32:46 - INFO - main - recall = 0.944

Then drop significantly and give the warning:

/home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp) 05/20/2022 21:09:44 - INFO - main - ***** Eval results ***** 05/20/2022 21:09:44 - INFO - main - acc = 0.5 05/20/2022 21:09:44 - INFO - main - auc = 0.59788 05/20/2022 21:09:44 - INFO - main - f1 = 0.3333333333333333 05/20/2022 21:09:44 - INFO - main - mcc = 0.0 05/20/2022 21:09:44 - INFO - main - precision = 0.25 05/20/2022 21:09:44 - INFO - main - recall = 0.5

I saw that some updates were made as mentioned in #10 but I'm still facing this issue. I would appreciate any feedback from you.

Kind regards, Salah

mosala777 avatar May 23 '22 14:05 mosala777

Hi developers

I am facing the same issues as @mosala777. Here is part of the stdout I observed:

  • INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom 12/20/2022 14:46:07 - INFO - main - ***** Running evaluation ***** 12/20/2022 14:46:07 - INFO - main - Num examples = 5350 12/20/2022 14:46:07 - INFO - main - Batch size = 32 Evaluating: 100%|█████████████| 168/168 [03:38<00:00, 1.30s/it] 12/20/2022 14:49:46 - INFO - main - ***** Eval results *****luating: 100%|█████████████| 168/168 [03:38<00:00, 1.01it/s] 12/20/2022 14:49:46 - INFO - main - acc = 0.7816822429906543 12/20/2022 14:49:46 - INFO - main - auc = 0.8839985990874502 12/20/2022 14:49:46 - INFO - main - f1 = 0.7782037084362665 12/20/2022 14:49:46 - INFO - main - mcc = 0.5846169035386265 12/20/2022 14:49:46 - INFO - main - precision = 0.8024470439531897 12/20/2022 14:49:46 - INFO - main - recall = 0.7825097242114136 /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr(). warnings.warn("To get the last learning rate computed by the scheduler, " {"eval_acc": 0.7816822429906543, "eval_f1": 0.7782037084362665, "eval_mcc": 0.5846169035386265, "eval_auc": 0.8839985990874502, "eval_precision": 0.8024470439531897, "eval_recall": 0.7825097242114136, "learning_rate": 0.0001956160743938891, "loss": 0.4473942193388939, "step": 400} 12/20/2022 14:57:46 - INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom 12/20/2022 14:57:46 - INFO - main - ***** Running evaluation ***** 12/20/2022 14:57:46 - INFO - main - Num examples = 5350 12/20/2022 14:57:46 - INFO - main - Batch size = 32 Evaluating: 100%|█████████████| 168/168 [03:16<00:00, 1.17s/it] /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:900: RuntimeWarning: invalid value encountered in double_scalars mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp) 12/20/2022 15:01:03 - INFO - main - ***** Eval results ***** 12/20/2022 15:01:03 - INFO - main - acc = 0.4968224299065421 12/20/2022 15:01:03 - INFO - main - auc = 0.4861425095900458 12/20/2022 15:01:03 - INFO - main - f1 = 0.3319180819180819 12/20/2022 15:01:03 - INFO - main - mcc = 0.0 12/20/2022 15:01:03 - INFO - main - precision = 0.24841121495327104 12/20/2022 15:01:03 - INFO - main - recall = 0.5 /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr(). warnings.warn("To get the last learning rate computed by the scheduler, " {"eval_acc": 0.4968224299065421, "eval_f1": 0.3319180819180819, "eval_mcc": 0.0, "eval_auc": 0.4861425095900458, "eval_precision": 0.24841121495327104, "eval_recall": 0.5, "learning_rate": 0.0001889737628694786, "loss": 0.5985230031609535, "step": 500}

I would appreciate any feedback too, thank you.

Best Regards WY

CherWeiYuan avatar Dec 20 '22 07:12 CherWeiYuan

The same problem is discussed here: https://github.com/ThilinaRajapakse/simpletransformers/issues/234

The solutions seem to be:

  • lower learning rate (--learning_rate 2e-5)
  • use smaller batch sizes (--per_gpu_train_batch_size 64)
  • perhaps to delete cache (--overwrite_cache)

Edited: some suggested parameters for what I used to get a smooth run

CherWeiYuan avatar Dec 26 '22 15:12 CherWeiYuan

Hi @CherWeiYuan I am facing a similar issue mentioned above and I tried your solution but I am still not seeing any improvement in the results.

NikitaBhandare avatar Mar 01 '23 03:03 NikitaBhandare

Hi, @NikitaBhandare! This solution worked for me, but I had to manually delete all models from ~/.cache/huggingface/hub/ and set seeds for numpy, torch, cuda, etc. before loading a new model. Did you try this? P.S. Looks like the lr=2e-4 is an error, it should definitely be 2e-5

logvinata avatar Jun 13 '23 11:06 logvinata