albert
albert copied to clipboard
Where is the function of Factorized Embedding Parameterization?
Hi all, I read the paper and some of the code, the paper indicates that there is a intermediate matrix ( E ) that factorizes the V -> H embedding lookup table to V -> E -> H matrices. However the code in https://github.com/google-research/albert/blob/c21d8a3616a4b156d21e795698ad52743ccd8b73/modeling.py#L199-L206 seems that the embedding is directly mapped from the input tensor.
So where is intermediate matrix V * E ? Am I missing something?
I have the same question. I would like some confirmation on this but from what I understand the shapes of self.word_embedding_output, self.output_embedding_table, self.embedding_output
are with respect to the embedding size. The code uses batches so I believe instead of the (V, E) matrix, you are looking for the (batch_size, seq_length, E) matrix, which is self.embedding_output
. This is later input into the transformer model and projected to the hidden_size (batch_size, seq_length, H) in the following:
https://github.com/google-research/albert/blob/c21d8a3616a4b156d21e795698ad52743ccd8b73/modeling.py#L1085-L1087
Thanks @asharma20 !