mmengine [CodeCamp2023-471] Support for automatic selection of a batch size that maximizes the utilization of GPU memory.

[CodeCamp2023-471] Support for automatic selection of a batch size that maximizes the utilization of GPU memory.

Open LALBJ opened this issue 1 year ago • 1 comments

Motivation

Modifying the auto batch size to prevent OOM errors. See more details in https://github.com/open-mmlab/mmengine/issues/1220

Modification

To facilitate the automatic selection of batch sizes in the Runner, include the auto_batchsize parameter. The Runner will examine the batch size and execute the process using the appropriate value.

To implement this feature, introduce a '_check_batchsize' method within the runner.train() function. This method will constantly monitor the execution for OOM errors. Whenever an OOM error is encountered, the batch size will be divided by 2.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
The documentation has been modified accordingly, like docstring or example tutorials.

Sep 11 '23 08:09 LALBJ

All committers have signed the CLA.

Sep 11 '23 08:09 CLAassistant

mmengine mmengine copied to clipboard

[CodeCamp2023-471] Support for automatic selection of a batch size that maximizes the utilization of GPU memory.

Motivation

Modification

Checklist

mmengine
mmengine copied to clipboard