Andrei Nesterov comments

Repositories
Issues
Comments

Results 44 comments of


Andrei Nesterov

Bidirectional RNN

In the PR I use `copy.deepcopy` to create a backward_layer as a copy of the forward_layer. Is it appropriate way to copy layer instances?

I also have a question about RNN implementation in the Trax. Why do we [initialize the hidden state](https://github.com/google/trax/blob/v1.3.9/trax/layers/rnn.py#L156) of GRU and LSTM layers proportionally to the dimension of their inputs?...

BERT model fails on its initialization

[Related issue](https://github.com/google/trax/issues/1101)

BERT model fails on its initialization

In the PR above, I've overridden the `_settable_attrs` function of the `PretrainedBERT` to allow setting `init_checkpoint` attribute required for loading the model from its checkpoints.