David Chiu
David Chiu
In the attention module, l_Q, l_K, and l_V are designed to be learnable. However, since this paper utilizes pretrained models, there is no necessity to train them.
Hello reviewers, I am going to merge all interfaces in `torch/optim`. Currently, I have only merged one (`torch/optim/adadelta.py`). Could you please take a look at it? I will continue merging...
Thanks, @janeyx99. I'm also wondering if this PR could be added to v2.3.1 by requiring a cherry-pick in the release tracker issue by myself. Is the release tracker issue available...
> This wouldn’t qualify for the cherry-pick for 2.3.1 as this change isn’t a crucial fix for anything. Is there a reason to not wait for the next cycle 2.4?...
> oh yes i believe so! https://dev-discuss.pytorch.org/t/pytorch-release-2-3-1-planning/2052 thanks for your reply 😊
@pytorchbot drci
thanks for your review @janeyx99, I splited the changes about lr_scheduler into #125556.
@pytorchbot merge
@pytorchbot merge
@pytorchbot rebase -b main