benchmark
benchmark copied to clipboard
redundant memory allocation maybe the root cause of OOMs
Hi @xuzhao9 ,
during the investigation of LLAMA_7b OOM issue, we found that there are many redundant memory allocation. maybe it's not necessary for test. 1, there is deepcopy for maybe_cast() and deepcopy_and_maybe_cast(). which would duplicate the memory on GPU allocated for this model. https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L2400 https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L2403
looks we need to check more strictly on deepcopy.
2, there is deepcopy in validate_model() too. https://github.com/pytorch/benchmark/blob/main/userbenchmark/dynamo/dynamobench/common.py#L1918
we can run the LLAMA_7b model(which has OOM issue previously https://github.com/pytorch/benchmark/issues/2051 ) with one A100 40G after commenting out the unnecessary deepcopy().
hope this information can help on fixing the OOM issues in this repo.
Thanks
dynamobench is owned by the PT2 team. In my understanding, it is used for accuracy check because some models are stateful. cc @desertfire is there a way to turn off deepcopy in dynamobench?