keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Fix BytePair special tokens tokenization

Open abuelnasr0 opened this issue 1 year ago • 2 comments

BytePair already tokenize special tokens but it was having a small nit explained here keras-team/keras-nlp#1435 this PR fixes it.

abuelnasr0 avatar Feb 20 '24 13:02 abuelnasr0

Thanks very much @abuelnasr0! Finally freeing up from our Gemma release. I'll try to review #1447, #1445 and #1397 as a set, but just a heads up I'll probably post feedback next week.

In the meantime, if you are looking for something to do, we still need BloomCausalLM. I'm hoping to do some refactoring (#1425) that will make adding generative classes way easier, but no need to block on that.

mattdangerw avatar Feb 23 '24 00:02 mattdangerw

@mattdangerw no problem, Take your time. The Gemma release was awesome work from you and the team. BloomCausalLM is already in my plans, but I was a little bit busy. I started adding it few days ago and I will continue today. may be I will open a PR today.

abuelnasr0 avatar Feb 24 '24 18:02 abuelnasr0