qt
qt copied to clipboard
How important is it to use sentence_output_nheads?
From the code https://github.com/stangelid/qt/blob/c136ac00e03adf443b90cd65ba0523a3617be01f/src/encoders.py#L37, the way of creating H, the number of sentence heads of the encoder, is to add a linear + norm layer to transform the input of the CLS token from (batch_size, nsents, 1, model_d) into (batch_size, nsents, sentence_output_nheads, new_model_d). I wonder why do we need this extra layer, instead of feeding the original input (batch_size, nsents, 1, model_d) into the quantization layer?
I feel that you probably did experiments with it and chose this design and want to hear more. Thanks for your response in advance.