Megatron-LM
Megatron-LM copied to clipboard
[QUESTION]why replace F.embedding() with [] on VocabParallelEmbedding class?
question @jon-barker hello, jon, I have some questions on the embedding, can you help explain? Why replace F.embedding(masked_input, self.weight) with self.weight[masked_input] in forward() function of class VocabParallelEmbedding? What is the difference between them? Why does the F.embedding() can bring 'non-determinism'?
link:https://github.com/NVIDIA/Megatron-LM/blob/core_r0.5.0/megatron/core/tensor_parallel/layers.py#L218