issue-tracking icon indicating copy to clipboard operation
issue-tracking copied to clipboard

Memory leak when Logging 3d histogram PyTorch tensor on GPU

Open ianpegg-bc opened this issue 4 years ago • 2 comments

Describe the Bug

The application runs out of memory and is killed attempting to log_histogram_3d with a Pytorch tensor on the GPU.

Expected behavior

Either of the following behaviors would be acceptable:

  • comet automatically converts the tensor to a form it can use
  • comet raises an informative exception

Where is the issue?

  • [ ] Comet Python SDK
  • [ ] Comet UI
  • [x] Third Party Integrations (Huggingface, TensorboardX, Pytorch Lighting etc)

To Reproduce

import comet_ml
import torch

assert torch.cuda.is_available()
experiment = comet_ml.Experiment(project_name="test")

device = 'cuda'
# device = 'cpu'
x = torch.rand(100, device=device)

experiment.set_step(0)
experiment.log_histogram_3d(x, "x")

The issue goes away when you set device='cpu'

Stack Trace

Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

stack trace if I stop it mid memory leak:

Traceback (most recent call last):
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1537, in fast_flatten
    items = numpy.array(items, dtype=float)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/torch/_tensor.py", line 725, in __array__
    return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1543, in fast_flatten
    items = numpy.array([numpy.array(item) for item in items], dtype=float)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1543, in <listcomp>
    items = numpy.array([numpy.array(item) for item in items], dtype=float)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/torch/_tensor.py", line 723, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ian.pegg/projects/shining_software/src/shining_research/map_divergence_detection/debug.py", line 12, in <module>
    experiment.log_histogram_3d(x, "x")
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/experiment.py", line 2861, in log_histogram_3d
    histogram.add(values)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 956, in add
    values = fast_flatten(values)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1550, in fast_flatten
    return numpy.array(flatten(items))
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1518, in flatten
    return list(lazy_flatten(items))
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/comet_ml/utils.py", line 1503, in lazy_flatten
    new_iterator = iter(value)
  File "/home/ian.pegg/miniconda3/envs/torch-nightly/lib/python3.9/site-packages/torch/_tensor.py", line 688, in __iter__
    if torch._C._get_tracing_state():
KeyboardInterrupt

Link to Comet Project/Experiment

https://www.comet.ml/ianpegg-bc/test

ianpegg-bc avatar Nov 12 '21 01:11 ianpegg-bc

Thanks for catching this @ianpegg-bc. I'll have our engineering team look into this.

DN6 avatar Nov 12 '21 17:11 DN6

@ianpegg-bc Following up here. I've created a ticket to for the engineering team to address this. In the mean time, the work around would be move the tensor to CPU before logging it as a histogram, as you have suggested.

DN6 avatar Nov 12 '21 17:11 DN6