DNABERT
DNABERT copied to clipboard
Finetuning Issue with Example Data
Hello,
I tried finetuning the 6-mer model with the provided example data. Performance seems to be good at first, but then it suddenly drops significantly raising a warning. First few evaluations start normally and improve like this:
05/20/2022 19:32:46 - INFO - main - ***** Eval results ***** 05/20/2022 19:32:46 - INFO - main - acc = 0.944 05/20/2022 19:32:46 - INFO - main - auc = 0.988568 05/20/2022 19:32:46 - INFO - main - f1 = 0.9439997759991039 05/20/2022 19:32:46 - INFO - main - mcc = 0.8880071040852491 05/20/2022 19:32:46 - INFO - main - precision = 0.9440071041136658 05/20/2022 19:32:46 - INFO - main - recall = 0.944
Then drop significantly and give the warning:
/home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1248: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use zero_division
parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/home/maborageh/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:873: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp)
05/20/2022 21:09:44 - INFO - main - ***** Eval results *****
05/20/2022 21:09:44 - INFO - main - acc = 0.5
05/20/2022 21:09:44 - INFO - main - auc = 0.59788
05/20/2022 21:09:44 - INFO - main - f1 = 0.3333333333333333
05/20/2022 21:09:44 - INFO - main - mcc = 0.0
05/20/2022 21:09:44 - INFO - main - precision = 0.25
05/20/2022 21:09:44 - INFO - main - recall = 0.5
I saw that some updates were made as mentioned in #10 but I'm still facing this issue. I would appreciate any feedback from you.
Kind regards, Salah
Hi developers
I am facing the same issues as @mosala777. Here is part of the stdout I observed:
- INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom 12/20/2022 14:46:07 - INFO - main - ***** Running evaluation ***** 12/20/2022 14:46:07 - INFO - main - Num examples = 5350 12/20/2022 14:46:07 - INFO - main - Batch size = 32 Evaluating: 100%|█████████████| 168/168 [03:38<00:00, 1.30s/it] 12/20/2022 14:49:46 - INFO - main - ***** Eval results *****luating: 100%|█████████████| 168/168 [03:38<00:00, 1.01it/s] 12/20/2022 14:49:46 - INFO - main - acc = 0.7816822429906543 12/20/2022 14:49:46 - INFO - main - auc = 0.8839985990874502 12/20/2022 14:49:46 - INFO - main - f1 = 0.7782037084362665 12/20/2022 14:49:46 - INFO - main - mcc = 0.5846169035386265 12/20/2022 14:49:46 - INFO - main - precision = 0.8024470439531897 12/20/2022 14:49:46 - INFO - main - recall = 0.7825097242114136 /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please use
get_last_lr()
. warnings.warn("To get the last learning rate computed by the scheduler, " {"eval_acc": 0.7816822429906543, "eval_f1": 0.7782037084362665, "eval_mcc": 0.5846169035386265, "eval_auc": 0.8839985990874502, "eval_precision": 0.8024470439531897, "eval_recall": 0.7825097242114136, "learning_rate": 0.0001956160743938891, "loss": 0.4473942193388939, "step": 400} 12/20/2022 14:57:46 - INFO - main - Loading features from cached file /home/weiyuan/Desktop/rbp/model/HNRNPA1/data_HNRNPA1/cached_dev_DNABERT3_101_dnaprom 12/20/2022 14:57:46 - INFO - main - ***** Running evaluation ***** 12/20/2022 14:57:46 - INFO - main - Num examples = 5350 12/20/2022 14:57:46 - INFO - main - Batch size = 32 Evaluating: 100%|█████████████| 168/168 [03:16<00:00, 1.17s/it] /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:1272: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Usezero_division
parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/sklearn/metrics/_classification.py:900: RuntimeWarning: invalid value encountered in double_scalars mcc = cov_ytyp / np.sqrt(cov_ytyt * cov_ypyp) 12/20/2022 15:01:03 - INFO - main - ***** Eval results ***** 12/20/2022 15:01:03 - INFO - main - acc = 0.4968224299065421 12/20/2022 15:01:03 - INFO - main - auc = 0.4861425095900458 12/20/2022 15:01:03 - INFO - main - f1 = 0.3319180819180819 12/20/2022 15:01:03 - INFO - main - mcc = 0.0 12/20/2022 15:01:03 - INFO - main - precision = 0.24841121495327104 12/20/2022 15:01:03 - INFO - main - recall = 0.5 /home/weiyuan/mambaforge/envs/dnabert/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:247: UserWarning: To get the last learning rate computed by the scheduler, please useget_last_lr()
. warnings.warn("To get the last learning rate computed by the scheduler, " {"eval_acc": 0.4968224299065421, "eval_f1": 0.3319180819180819, "eval_mcc": 0.0, "eval_auc": 0.4861425095900458, "eval_precision": 0.24841121495327104, "eval_recall": 0.5, "learning_rate": 0.0001889737628694786, "loss": 0.5985230031609535, "step": 500}
I would appreciate any feedback too, thank you.
Best Regards WY
The same problem is discussed here: https://github.com/ThilinaRajapakse/simpletransformers/issues/234
The solutions seem to be:
- lower learning rate (--learning_rate 2e-5)
- use smaller batch sizes (--per_gpu_train_batch_size 64)
- perhaps to delete cache (--overwrite_cache)
Edited: some suggested parameters for what I used to get a smooth run
Hi @CherWeiYuan I am facing a similar issue mentioned above and I tried your solution but I am still not seeing any improvement in the results.
Hi, @NikitaBhandare! This solution worked for me, but I had to manually delete all models from ~/.cache/huggingface/hub/ and set seeds for numpy, torch, cuda, etc. before loading a new model. Did you try this? P.S. Looks like the lr=2e-4 is an error, it should definitely be 2e-5