Tim Moon comments

Results 227 comments of


                                            Tim Moon

[PyTorch] Activation operations

/te-ci pytorch

[PyTorch] Activation operations

/te-ci pytorch

[PyTorch] Activation operations

/te-ci pytorch

[PyTorch] Activation operations

Merging with approval from @ptrendx and @ksivaman.

`Float8Quantizer::create_tensor` calculates `scale_inv` instead of creating an empty buffer

Logically, `scale` is part of the recipe and `scale_inv` is part of the data: - You can change `scale` at any time, for whatever reason, to any value. It is...

AdamW implementation does not truly decouple learning rate and weight decay

For better or for worse, I think "AdamW" now refers to the LR-coupled version. In addition to [PyTorch](https://github.com/pytorch/pytorch/blob/d921891f5788b37ea92eceddf7417d11e44290e6/torch/optim/_functional.py#L125) and [JAX](https://github.com/google-deepmind/optax/blob/2cdb89cc4935d8dc5c8a06344e7d50dc7a7419b0/optax/_src/alias.py#L640), I see this formulation in [Keras](https://github.com/keras-team/keras/blob/8f5592bcb61ff48c96560c8923e482db1076b54a/keras/src/optimizers/base_optimizer.py#L828C44-L829C1) (and therefore [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/AdamW)), [PaddlePaddle](https://github.com/PaddlePaddle/Paddle/blob/2aebcd88c95ac8c2d917f6485c295f8837f71131/python/paddle/optimizer/adamw.py#L66),...

Tim Moon

[PyTorch] Activation operations

[PyTorch] Activation operations

[PyTorch] Activation operations

[PyTorch] Activation operations

`Float8Quantizer::create_tensor` calculates `scale_inv` instead of creating an empty buffer

AdamW implementation does not truly decouple learning rate and weight decay

[BUG] Assertion failed: t.data.dptr != nullptr. Input x is not allocated!

[BUG] Assertion failed: t.data.dptr != nullptr. Input x is not allocated!

[Core] Fix inconsistent logic in C++ tensor class

[Core] Fix inconsistent logic in C++ tensor class