Matt Watson comments

Results 339 comments of


                                            Matt Watson

Arabic Tokenizer

Yeah, we have an issue open for a sentence piece tokenizer https://github.com/keras-team/keras-nlp/issues/27. I will get to that in the next week or two hopefully! Re https://farasa.qcri.org/ we could definitely support...

WordPieceTokenizer token splitting

keep pattern is the regex of tokens to keep that you split on split pattern is the regex of tokens to split on sounds like you would like to split...

WordPieceTokenizer token splitting

But overall, given the overall use case you are talking about, it is probably be easier to just pad with a start token after tokenization. Then there's no caveats of...

WordPieceTokenizer token splitting

Hmm, I'm not sure we would want to support people doing math on our default regex pattern, that would be a compat nightmare. Something like this would work ``` split_pattern="\s|[!-/:-@^-`{-~]",...

BERT example integration test

@chenmoneygithub @fchollet let me know what you think of this. We definitely need some sort of automated testing here. I think this could be a good template for integration tests...

BERT example integration test

I think I also like this as a forcing function for simple "out of box" use. Needing to write a single, smallish test that runs your whole training pipeline is...

BERT example integration test

Talked with @fchollet on this, we should do a few things. 1) Move as much logic as possible out of the runnable script files into `bert_model.py` (and potentially add a...

`TokenAndPositionEmbedding`, `TransformerEncoder` and `TransformerDecoder` can be saved, but prevent the model from being loaded

Thanks for filing! I think we could clear up this issue by adding `keras.utils.register_keras_serializable(package="keras_nlp")` annotations to our layers. This would support h5, but also force our tf-style saved model loading...

Add a keras.io guide for pretraining a transformer with keras-nlp

Guide is incoming https://github.com/keras-team/keras-io/pull/859

Add a keras.io guide for pretraining a transformer with keras-nlp

@ddofer this is incoming! And top priority for us actually.