Matt Watson
Matt Watson
This is mostly implemented, but still needs a little work. I'll push code shortly.
This will probably have some failing tests, just seeing how it does. Need to add some unit tests for a few bug I spotted in our converters still.
We should also keep the docstring for the method on the `Backbone` base class. And factor out all the error checking somehow. That way the per model code here could...
Contributions are welcome here, but this is a fairly abstract problem that would need some scouting out first. We could try to leverage Keras' DataAdapter here, I'm not sure how...
@SamanehSaadat @divyashreepathihalli leaving you both assigned here so we can monitor this issue for comments and new contributors. I've pinned it to the top of our issue list (following Keras).
Looked at this a bit. I think #1861 will be a important precursor work to make implementing this reasonable. I also think we might want to consider starting on some...
It's unclear to me whether Jetstream supports tokenization beyond sentencepiece and gpt-style-bpe, see [this](https://github.com/google/maxtext/blob/5af84912f4d11f356ea9929950faa7c50b12ae85/MaxText/maxengine.py#L358-L363) for maxtext. This is something to look into.
The only thing I could see that we might be able to control from the Keras side is, `max_queue` in [create_file_writer](https://www.tensorflow.org/api_docs/python/tf/summary/create_file_writer). We could try setting that to a larger value...
@grasskin cc'ing you here as well in case you have more context.
Probably and issue with the `cudnn` specific implementation on the tf backend, which is pretty dense. I will take a look.