sha-rnn
sha-rnn copied to clipboard
Single Headed Attention RNN - "Stop thinking with your head"
Thanks for the great paper! I've created another open source implementation of the SHA RNN here: https://github.com/talonvoice/sha-rnn I trained with similar parameters to the single head model at the end...
I've successfully reimplemented your work in Julia / Knet DL framework here [SHA-RNN.jl](https://github.com/alisafaya/SHA-RNN.jl). During training I've faced some problems with the first batch of the dataset. Since there is no...
 In Figure 1 there's a claim that the attention module is "highly efficient". This's explained by removing the need for K/V transforms. Then for the attention scores block it...
Can you provide any further information on the loss function you are using? Perhaps a reference to a paper?
Could you clarify what's the license of this code base? It would be helpful to say so explicitly as well as to add a license file to the repo. Thank...
Thanks for sharing this code! I'd like to try on my own training dataset, but I keep getting GPU OOM problems: ``` RuntimeError: CUDA out of memory. Tried to allocate...
Thanks first for such nice paper and work! I'm trying to train a text generation model with my own dataset. The tokenize function in data.py https://github.com/Smerity/sha-rnn/blob/218d748022dbcf32d50bbbb4d151a9b6de3f8bba/data.py#L34 uses split() to tokenize...
Hi @Smerity , thanks for open sourcing the code for that great project :heart: I trained a character-based model for German on ~1GB of text (mainly from OPUS). It worked...
Hi @Smerity , could you share the pretrained SHA-RNN weights from your WikiText103 experiments? I'd like to do some fine-tuning experiments with it for text classification. (It would be a...