sparsegen icon indicating copy to clipboard operation
sparsegen copied to clipboard

Code for the NeurIPS 2018 paper "On Controllable Sparse Alternatives to Softmax"

Results 1 sparsegen issues
Sort by recently updated
recently updated
newest added

Hi! I'm trying to use these sparse functions as an alternative to the softmax function in the attention mechanisms of transformers. However, the loss becomes NaN in the first iteration......