Compact-Transformers icon indicating copy to clipboard operation
Compact-Transformers copied to clipboard

change TextTokenizer 2DConvolution to 1D

Open simonlevine opened this issue 4 years ago • 1 comments

Hello,

It seems to make more intuitive sense to use 1D convolutions here over the embedding with a channel size equal to the word embedding dimension, rather than the edge-case of a 2D convolution as is currently implemented. I would personally make this change to match other networks with similar convolutions over nn.Embeddings. I believe this has no change to performance but rather is presented for clarity. Thank you

simonlevine avatar Jan 07 '22 19:01 simonlevine

Hello,

Thank you, yes it would not result in any significant difference, at least that is what we observed in our experiments. I'll check this PR and make separate models with 1D convs, because I think we'll experience a key mismatch with our old checkpoints if we replace the models.

alihassanijr avatar Jan 10 '22 18:01 alihassanijr