sha-rnn icon indicating copy to clipboard operation
sha-rnn copied to clipboard

Single Headed Attention RNN - "Stop thinking with your head"

Results 13 sha-rnn issues
Sort by recently updated
recently updated
newest added

Thanks for the great paper! I've created another open source implementation of the SHA RNN here: https://github.com/talonvoice/sha-rnn I trained with similar parameters to the single head model at the end...

I've successfully reimplemented your work in Julia / Knet DL framework here [SHA-RNN.jl](https://github.com/alisafaya/SHA-RNN.jl). During training I've faced some problems with the first batch of the dataset. Since there is no...

![image](https://user-images.githubusercontent.com/5769148/97804190-e8986a00-1c56-11eb-9879-2cb8157c265a.png) In Figure 1 there's a claim that the attention module is "highly efficient". This's explained by removing the need for K/V transforms. Then for the attention scores block it...

Can you provide any further information on the loss function you are using? Perhaps a reference to a paper?

Could you clarify what's the license of this code base? It would be helpful to say so explicitly as well as to add a license file to the repo. Thank...

Thanks for sharing this code! I'd like to try on my own training dataset, but I keep getting GPU OOM problems: ``` RuntimeError: CUDA out of memory. Tried to allocate...

Thanks first for such nice paper and work! I'm trying to train a text generation model with my own dataset. The tokenize function in data.py https://github.com/Smerity/sha-rnn/blob/218d748022dbcf32d50bbbb4d151a9b6de3f8bba/data.py#L34 uses split() to tokenize...

Hi @Smerity , thanks for open sourcing the code for that great project :heart: I trained a character-based model for German on ~1GB of text (mainly from OPUS). It worked...

Hi @Smerity , could you share the pretrained SHA-RNN weights from your WikiText103 experiments? I'd like to do some fine-tuning experiments with it for text classification. (It would be a...