benchmark icon indicating copy to clipboard operation
benchmark copied to clipboard

Benchmark test failing due to CUDA out of memory, GPU 8GB

Open andrewtvuong opened this issue 4 years ago • 1 comments

RTX 3070ti 8gb NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2

pytest test_bench.py --ignore_machine_config

Is 8gb too small for hobbyist?

Do we need to add the following to the script depending on GPU memory?

import gc
gc.collect()
torch.cuda.empty_cache()

andrewtvuong avatar Dec 18 '21 21:12 andrewtvuong

In the past few months, we increases the default batch size for most of the models to stick to the following principle:

  • Train batch sizes should be consistent with either the research paper, or a reference correct implementation (such as default value in github repo);
  • Eval batch sizes should be the smallest int that is able to saturate the GPU on our CI platform (which uses an Nvidia T4 16GB)

This is to simulate the GPU utilization and framework overhead that occurs in real-world training/inference scenario.

So yes, unfortunately the default batch size will be too large if you are using a 8gb GPU. However, most models should now support train_bs and eval_bs arguments, which allows the framework to run the model with smaller batch sizes. We welcome user's contribution to add an option, e.g., "pytest test_bench.py --single-batch-size" to run the suite with batch_size==1 for all the models.

xuzhao9 avatar Dec 20 '21 20:12 xuzhao9