Sidharth Baskaran

Results 2 issues of Sidharth Baskaran

In the Transformer, a weight sharing scheme between the input embedding and output projection layer is used to improve efficiency. Any reasons why this is not implemented, and how it...

### 🚀 The feature, motivation and pitch As the kernels seem to be limited to the FP32 data type at the moment, it would be immensely helpful to have the...

feature