Rajarshee Mitra comments

Results 10 comments of


                                            Rajarshee Mitra

how to project attention distribution to vocab distribution when both are tensors and not list ?

Also, if it doesn't work for the initial set up I described, I would still be happy to get a solution for the scenario where all dimensions are known !

binary build of onnx is failing to load libprotobuf.so.20

Building from source was a workaround for me.

[won't merge - v1 codebase] Bert

Do you have some comparison between BERT and w/o BERT on NMT tasks?

get the decoder hidden states after decoding

Need attention, please!

get the decoder hidden states after decoding

It's kind of strange that the hidden states are, by default, not exposed :/

get the decoder hidden states after decoding

"I think it will use RNN hidden states as the logits, and argmax on the hidden state to try to get a word id." It looks very undesirable but then...

get the decoder hidden states after decoding

@oahziur I don't get your first point. How can we compute the hidden states of all steps in the first place without using the output layer and taking the argmax...

get the decoder hidden states after decoding

Yeah, I get your point. But if I am not using teacher forcing (or using GreedyEmbedingHelper), I would want my predicted ids to be used. And for that to happen,...

get the decoder hidden states after decoding

So, I need to hack my way into it to use output_layer as a part of the decoder and also make the dynamic_decode return hidden states. Any suggestions about what...

get the decoder hidden states after decoding

This is what I did: added a new attribute ```final_output``` in ```BasicDecoderOutput``` namedtuple that shall store projected outputs whenever there is an ```output_layer``` in ```BasicDecoder```. In the ```step()``` of ```BasicDecoder```,...