change TextTokenizer 2DConvolution to 1D
Hello,
It seems to make more intuitive sense to use 1D convolutions here over the embedding with a channel size equal to the word embedding dimension, rather than the edge-case of a 2D convolution as is currently implemented. I would personally make this change to match other networks with similar convolutions over nn.Embeddings. I believe this has no change to performance but rather is presented for clarity. Thank you
Hello,
Thank you, yes it would not result in any significant difference, at least that is what we observed in our experiments. I'll check this PR and make separate models with 1D convs, because I think we'll experience a key mismatch with our old checkpoints if we replace the models.