Add a byte or character level seq to seq example on keras.io
This could be similar to very similar in structure to the lstm seq2seq guide on keras.io, but show using either the ByteTokenizer or UnicodeCharacterTokenizer (or both).
We should demo training a model with preprocessing applied in tf.data, and then generate text using our new text generation util.
Assigning myself as a placeholder, I believe we may already have some people to work on this.
Hey I can take this up
@mattdangerw Just to confirm I should use the same LSTM model right? or should I use KerasNLP's encoder-decoder components?
No strong preference from me for the model. This guide will be more about tokenizers and greedy_search than the model itself.
LSTM could be nice and simple, but if you can make a small transformer that performs better and reads better that is fine too! Whatever you prefer.