Sidharth Baskaran
Results
2
issues of
Sidharth Baskaran
In the Transformer, a weight sharing scheme between the input embedding and output projection layer is used to improve efficiency. Any reasons why this is not implemented, and how it...
### 🚀 The feature, motivation and pitch As the kernels seem to be limited to the FP32 data type at the moment, it would be immensely helpful to have the...
feature