maxtext icon indicating copy to clipboard operation
maxtext copied to clipboard

Compatibility issue with tensorflow>=2.15.1 on GPU

Open chajath opened this issue 11 months ago • 0 comments

Hi team,

I'm having an issue launching the pretraining job with tensorflow 2.15 or above. Tensorflow 2.15 immediately segdumps. With the latest tensorflow 2.16.1 I see there is an unbound or near-100% video memory growth of one of the data loading process, leading to CUDA OOM and cascading failures. One quick workaround is to locally install lower tensorflow versions e.g.

pip install tensorflow==2.13.1 tensorflow-text==2.13.0

Also works:

pip install tensorflow==2.14.1 tensorflow-text==2.14.0

chajath avatar Mar 13 '24 21:03 chajath