Xa9aX ツ
Xa9aX ツ
@oke-aditya You can add support for [Echo](https://github.com/digantamisra98/Echo). We will be releasing beta version soon (New Year's eve) containing optimizers, activations and attention layers. Subsequent releases will contain all custom layers...
@oke-aditya Absolutely understandable.
@extremety1989 are you planning to submit a PR?
@extremely1989 Mish has a Softplus operator which needs proper threshold to fix that NaN issue you might be facing.
@extremety1989 the Softplus operator thresholds that Tensorflow use is in the range of [0,20]
@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in...
Thanks for the appreciation of my work. I'm glad Mish has been working well in your projects. This is an interesting observation, I haven't extensively investigated into the optimal initialization...
Right, the reason I'm interested in orthogonal initialization because of [this](https://github.com/EsterHlav/Dynamical-Isometry-from-Orthogonality-Neural-Nets). Let me know if you have had any progress with orthogonal initialization.
@evanatyourservice Thanks for the update. Interesting, in orthogonal, you need to keep the init on the EOC otherwise you'll either have vanishing or exploding gradients. Keep me posted with your...
@small-Qing for sequence data, we would have to implement a version `Evonorm1d` since the 2d version expects a 4-dimensional tensor of the shape ``. This shouldn't be that difficult to...