Andrei Nesterov

Results 44 comments of Andrei Nesterov

In the PR I use `copy.deepcopy` to create a backward_layer as a copy of the forward_layer. Is it appropriate way to copy layer instances?

I also have a question about RNN implementation in the Trax. Why do we [initialize the hidden state](https://github.com/google/trax/blob/v1.3.9/trax/layers/rnn.py#L156) of GRU and LSTM layers proportionally to the dimension of their inputs?...

[Related issue](https://github.com/google/trax/issues/1101)

In the PR above, I've overridden the `_settable_attrs` function of the `PretrainedBERT` to allow setting `init_checkpoint` attribute required for loading the model from its checkpoints.