Matt Watson
Matt Watson
Yeah, we have an issue open for a sentence piece tokenizer https://github.com/keras-team/keras-nlp/issues/27. I will get to that in the next week or two hopefully! Re https://farasa.qcri.org/ we could definitely support...
keep pattern is the regex of tokens to keep that you split on split pattern is the regex of tokens to split on sounds like you would like to split...
But overall, given the overall use case you are talking about, it is probably be easier to just pad with a start token after tokenization. Then there's no caveats of...
Hmm, I'm not sure we would want to support people doing math on our default regex pattern, that would be a compat nightmare. Something like this would work ``` split_pattern="\s|[!-/:-@^-`{-~]",...
@chenmoneygithub @fchollet let me know what you think of this. We definitely need some sort of automated testing here. I think this could be a good template for integration tests...
I think I also like this as a forcing function for simple "out of box" use. Needing to write a single, smallish test that runs your whole training pipeline is...
Talked with @fchollet on this, we should do a few things. 1) Move as much logic as possible out of the runnable script files into `bert_model.py` (and potentially add a...
Thanks for filing! I think we could clear up this issue by adding `keras.utils.register_keras_serializable(package="keras_nlp")` annotations to our layers. This would support h5, but also force our tf-style saved model loading...
Guide is incoming https://github.com/keras-team/keras-io/pull/859
@ddofer this is incoming! And top priority for us actually.