automl Since EfficientDet requieres TensorFlow > 2.8 we can't train anymore with CUDA

Since EfficientDet requieres TensorFlow > 2.8 we can't train anymore with CUDA

Open fitoule opened this issue 3 years ago • 4 comments

I have only one NVIDIA GPU, I was training with TensorFlow 2.5.2 because of the bug with GPU and multiprocessing.

TF2.8 and No Child Process => works but Memory Leak :(
TF2.8 and Child Process => CUDA error on the first epoch because GPU has been taken by the main process https://github.com/google/automl/issues/855
TF2.5.2 and Child Process => does not work anymore since fix determinism

It was working with TensorFlow until 2.5.2 but now efficientdet require TF > 2.8 so I am stuck. I have to find code before "determinism" I think

Apr 15 '22 08:04 fitoule

Migrate to tf2
Set num_epochs=1 and num_examples_per_epoch=num_epochs * num_exampels

Apr 15 '22 09:04 fsx950223

You mean I need to use the code under efficientdet/tf2/train.py ? or migrate by myself efficientdet/main.py ?

thank you

Apr 19 '22 07:04 fitoule

@fitoule you mentioned some memory leak. I am facing too a memory leak. Can you give more info?

May 08 '22 19:05 exx8

I faced with the same problem. I used traineval mode, tensorflow 2.10 (then 2.13), in both cases there was memory leak after first epoch. Training was fine, but during evaluation probably CocoCallback cause memory leak. I commented this line (https://github.com/google/automl/blob/master/efficientdet/tf2/train_lib.py#L220) and everything is fine.

Nov 21 '23 13:11 mateusz-wozny

automl automl copied to clipboard

Since EfficientDet requieres TensorFlow > 2.8 we can't train anymore with CUDA

automl
automl copied to clipboard