maxtext
maxtext copied to clipboard
Compatibility issue with tensorflow>=2.15.1 on GPU
Hi team,
I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound or near-100% video memory growth of one of the data loading process, leading to CUDA OOM and cascading failures. One quick workaround is to locally install lower tensorflow versions e.g.
pip install tensorflow==2.13.1 tensorflow-text==2.13.0
Also works:
pip install tensorflow==2.14.1 tensorflow-text==2.14.0