minbpe icon indicating copy to clipboard operation
minbpe copied to clipboard

Using minBPE token encoded sentence vectors need to be padded

Open elevateclub opened this issue 11 months ago • 1 comments

Without the padding, the sentences end up being different sizes and we get stacking errors at data loading time.

elevateclub avatar Mar 19 '24 06:03 elevateclub

Would probably require the introduction of a '' special character which might make the code feel a bit more edge casey, digging us deeper into the ugliness that is tokenization 😢

elevateclub avatar Mar 19 '24 06:03 elevateclub