tf-raft icon indicating copy to clipboard operation
tf-raft copied to clipboard

About GPU memory consumption.

Open AlbertHuyb opened this issue 2 years ago • 4 comments

When training with python train_chairs.py configs/train_chairs.yml, I noticed that batch_size=4 exceeds the memory limitation of a single 2080ti GPU. Instead, I can only set batch_size to 1 on a single 2080ti GPU, which consumes more than 10GB GPU memory.

I use tensorflow=2.3.0, because I noticed that 2.8.0 is not supported by Tensorflow Addons.

Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.2.0 and strictly below 2.4.0 (nightly versions are not supported). The versions of TensorFlow you are currently using is 2.8.0 and is not supported.

AlbertHuyb avatar May 10 '22 03:05 AlbertHuyb

There's probably a memory leak somewhere...

adeeb10abbas avatar May 10 '22 14:05 adeeb10abbas

There's probably a memory leak somewhere...

Could you please share your environment and GPU memory consumption? I'm really new to Tensorflow2 and feel quite confused.

Thanks for your help!

AlbertHuyb avatar May 10 '22 14:05 AlbertHuyb

After I set os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true", batch_size=1 occupies 10996MiB and batch_size=2 occupies 10996MiB, while batch_size=3 returns OOM.

I'm using python 3.7.13, tensorflow 2.4.0, cudatoolkit 11.0 on ubuntu 18.04

AlbertHuyb avatar May 10 '22 15:05 AlbertHuyb

This fixxed it for me: https://github.com/daigo0927/tf-raft/pull/27

giulionf avatar Jun 28 '22 16:06 giulionf