sparsegen
sparsegen copied to clipboard
Code for the NeurIPS 2018 paper "On Controllable Sparse Alternatives to Softmax"
Results
1
sparsegen issues
Sort by
recently updated
recently updated
newest added
Hi! I'm trying to use these sparse functions as an alternative to the softmax function in the attention mechanisms of transformers. However, the loss becomes NaN in the first iteration......