Matt Watson
Matt Watson
It might be good to build the script specifically for BERT right now, and then shuffle of the components as we get a better understanding of what we need.
Assigning this to @aflah02 as I think you are the one actively working on this.
Assigning myself as a placeholder, I believe we may already have some people to work on this.
Thank you!
Also re-beam search, separate PR sounds good!
I think a pull request went by recently where we stopped doing seeded random generation because of discrepancies. https://github.com/keras-team/keras-nlp/pull/269 Is this safe to land as is @chenmoneygithub @jessechancy ?
@chenmoneygithub do you know why the accelerator testing is failing here? This would be a great one to actually test on accelerators.
There's a few different ways we could do this. We can't only use the `TokenAndPositionEmbedding` as the bert model also needs a segment embedding. The best approach for now might...
For BERT finetuning with pretrained weights, we still need a story for downloading those pretrained weights (this is actively under discussion). Let's hold off on rewriting any existing examples as...