sha-rnn issues

2

Thanks for the great paper! I've created another open source implementation of the SHA RNN here: https://github.com/talonvoice/sha-rnn I trained with similar parameters to the single head model at the end...

lunixbochs

Randomly zeroing out hidden and memory during training

2

I've successfully reimplemented your work in Julia / Knet DL framework here [SHA-RNN.jl](https://github.com/alisafaya/SHA-RNN.jl). During training I've faced some problems with the first batch of the dataset. Since there is no...

alisafaya

Efficiency claims on attention module used

1

![image](https://user-images.githubusercontent.com/5769148/97804190-e8986a00-1c56-11eb-9879-2cb8157c265a.png) In Figure 1 there's a claim that the attention module is "highly efficient". This's explained by removing the need for K/V transforms. Then for the attention scores block it...

munael

SplitcrossEntropy

1

Can you provide any further information on the loss function you are using? Perhaps a reference to a paper?

gslaller

LICENSE

Could you clarify what's the license of this code base? It would be helpful to say so explicitly as well as to add a license file to the repo. Thank...

searchivarius

how to control GPU ram usage

7

Thanks for sharing this code! I'd like to try on my own training dataset, but I keep getting GPU OOM problems: ``` RuntimeError: CUDA out of memory. Tried to allocate...

jprobichaud

error happened when new token appears in the valid/test data set

1

Thanks first for such nice paper and work! I'm trying to train a text generation model with my own dataset. The tokenize function in data.py https://github.com/Smerity/sha-rnn/blob/218d748022dbcf32d50bbbb4d151a9b6de3f8bba/data.py#L34 uses split() to tokenize...

carter54

Gradient overflows

1

Hi @Smerity , thanks for open sourcing the code for that great project :heart: I trained a character-based model for German on ~1GB of text (mainly from OPUS). It worked...

stefan-it

Could you share pretrained model weights?

1

Hi @Smerity , could you share the pretrained SHA-RNN weights from your WikiText103 experiments? I'd like to do some fine-tuning experiments with it for text classification. (It would be a...

vhargitai

sha-rnn
sha-rnn copied to clipboard

Metadata

README on running the model on jupyter notebooks and Google colab

another implementation + partial reproduction

Randomly zeroing out hidden and memory during training

Efficiency claims on attention module used

SplitcrossEntropy

LICENSE

how to control GPU ram usage

error happened when new token appears in the valid/test data set

Gradient overflows

Could you share pretrained model weights?

← Metadata

Owner

Metadata

sha-rnn sha-rnn copied to clipboard

Metadata

← Metadata

Owner

Metadata

sha-rnn
sha-rnn copied to clipboard