bran
bran copied to clipboard
Pre trained Embedding
How did you pretrain the embeddings, I am confused because they are based on byte pair encoding?
Sorry for the delayed response. I trained word2vec using bytepair tokenized data. In reality, I noticed very little difference in performance using the pretrained vs randomly initialized byte pair embeddings.