keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

Add a byte or character level seq to seq example on keras.io

Open mattdangerw opened this issue 3 years ago • 4 comments

This could be similar to very similar in structure to the lstm seq2seq guide on keras.io, but show using either the ByteTokenizer or UnicodeCharacterTokenizer (or both).

We should demo training a model with preprocessing applied in tf.data, and then generate text using our new text generation util.

mattdangerw avatar May 19 '22 00:05 mattdangerw

Assigning myself as a placeholder, I believe we may already have some people to work on this.

mattdangerw avatar May 19 '22 00:05 mattdangerw

Hey I can take this up

aflah02 avatar May 23 '22 17:05 aflah02

@mattdangerw Just to confirm I should use the same LSTM model right? or should I use KerasNLP's encoder-decoder components?

aflah02 avatar May 28 '22 05:05 aflah02

No strong preference from me for the model. This guide will be more about tokenizers and greedy_search than the model itself.

LSTM could be nice and simple, but if you can make a small transformer that performs better and reads better that is fine too! Whatever you prefer.

mattdangerw avatar Jun 01 '22 21:06 mattdangerw