tensorboard_logger icon indicating copy to clipboard operation
tensorboard_logger copied to clipboard

Python2 crashes if tensorboard_logger is imported before torch

Open kukuruza opened this issue 6 years ago • 3 comments

Thanks for the great package, it really brings much value for me. But I've recently come across a python crash.

*** Error in `python': malloc(): memory corruption: 0x000000007842e4c0 ***
Aborted (core dumped)

Steps to reproduce. Running the script below causes the crash on the last line (forward pass of the network).

from tensorboard_logger import configure

import torch
from torch.autograd import Variable

mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)

I also found that switching the order of the imports solves the problem. The following works fine.

import torch
from torch.autograd import Variable

from tensorboard_logger import configure

mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)

If I am not using .cuda() in the code, any order works fine.

System:

Ubuntu 14.04.5 LTS
Cuda  8.0, V8.0.61

Packages:

python                    2.7.15               h33da82c_4    conda-forge
pytorch                   0.4.1                py27__9.0.176_7.1.2_2
tensorboard-logger        0.1.0

I installed them with

conda install pytorch torchvision -c pytorch
pip install tensorboard_logger

I assume the order of imports was tested before, so my only guess is that conda and pip don't work well together and load different versions of some package.

kukuruza avatar Dec 12 '18 15:12 kukuruza

wow, that's quite a nasty bug! And thanks for reducing this to a small example.

If I am not using .cuda() in the code, any order works fine.

so the crash happens not on import, right?

I tried running the first (problematic) version of the script, and it didn't crash on me (using python 3.6 and same versions of packages). I remember having issues with torch import order like this: https://github.com/pytorch/pytorch/issues/2083 but not the memory corruption.

To sum up, I'm not sure I'll be able to help here much, sorry. tensorboard_logger is pure python and is not supposed to do anything nasty, but still I can't explain why this error is happening. This could be some unrelated issue in pytorch or some other C library which is triggered only upon specific conditions, maybe if you obtain the backtrace from the crash this would help to narrow it down.

And thank you for the kind words about the library.

lopuhin avatar Dec 12 '18 16:12 lopuhin

That's correct, the crash happens on the last line, which is the forward pass in the network (I edited the issue for clarity of others.)

I guess I just wanted to document this as an issue so that anyone who comes across similar behavior has one idea to try out.

And if I figure out that combination of factors that causes the problem, I'll comment here.

kukuruza avatar Dec 12 '18 17:12 kukuruza

Thanks @kukuruza let's keep it open so that it's more visible in case someone else also has this problem

lopuhin avatar Dec 13 '18 07:12 lopuhin