Rajarshee Mitra

Results 10 comments of Rajarshee Mitra

Also, if it doesn't work for the initial set up I described, I would still be happy to get a solution for the scenario where all dimensions are known !

Building from source was a workaround for me.

Do you have some comparison between BERT and w/o BERT on NMT tasks?

It's kind of strange that the hidden states are, by default, not exposed :/

"I think it will use RNN hidden states as the logits, and argmax on the hidden state to try to get a word id." It looks very undesirable but then...

@oahziur I don't get your first point. How can we compute the hidden states of all steps in the first place without using the output layer and taking the argmax...

Yeah, I get your point. But if I am not using teacher forcing (or using GreedyEmbedingHelper), I would want my predicted ids to be used. And for that to happen,...

So, I need to hack my way into it to use output_layer as a part of the decoder and also make the dynamic_decode return hidden states. Any suggestions about what...

This is what I did: added a new attribute ```final_output``` in ```BasicDecoderOutput``` namedtuple that shall store projected outputs whenever there is an ```output_layer``` in ```BasicDecoder```. In the ```step()``` of ```BasicDecoder```,...