aayux
aayux
It is better practice to apply weight decay methods in each layer. This commit proposes to change the current implementation of L2 Regularization from just the output layer to the...
- Python 3 style division to resolve bug in #24. - Other \_\_future\_\_ imports for Python 3 compatibility. Finally, this could also break intended behaviour in some places so: -...
Looking for contributors to help with major code cleanup and refactoring based on PEP-8 style guidelines.
Add a `BiAttentionPoolingClassifier` (self attention for pooling linear classifier) as in [Attention is all you need](https://arxiv.org/abs/1706.03762) following the discussion with @sebastianruder in Teams. I ran out of memory on my...