sentencepiece
sentencepiece copied to clipboard
Guidance on how to implement subword sampling at train time
I guess I should be re-sampling tokenizations on the train data with SP before each epoch, but it would be nice to see a canonical implementation of this in $FRAMEWORK.
will do.
Any update on this ?