tensorboard_logger
tensorboard_logger copied to clipboard
Python2 crashes if tensorboard_logger is imported before torch
Thanks for the great package, it really brings much value for me. But I've recently come across a python crash.
*** Error in `python': malloc(): memory corruption: 0x000000007842e4c0 ***
Aborted (core dumped)
Steps to reproduce. Running the script below causes the crash on the last line (forward pass of the network).
from tensorboard_logger import configure
import torch
from torch.autograd import Variable
mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)
I also found that switching the order of the imports solves the problem. The following works fine.
import torch
from torch.autograd import Variable
from tensorboard_logger import configure
mymodel = torch.nn.Sequential(torch.nn.Conv2d(3, 10, kernel_size=3, bias=True))
imgs = Variable(torch.zeros((1,3,64,64), dtype=torch.float32)).cuda()
mymodel.cuda()
mymodel(imgs)
If I am not using .cuda()
in the code, any order works fine.
System:
Ubuntu 14.04.5 LTS
Cuda 8.0, V8.0.61
Packages:
python 2.7.15 h33da82c_4 conda-forge
pytorch 0.4.1 py27__9.0.176_7.1.2_2
tensorboard-logger 0.1.0
I installed them with
conda install pytorch torchvision -c pytorch
pip install tensorboard_logger
I assume the order of imports was tested before, so my only guess is that conda
and pip
don't work well together and load different versions of some package.
wow, that's quite a nasty bug! And thanks for reducing this to a small example.
If I am not using .cuda() in the code, any order works fine.
so the crash happens not on import, right?
I tried running the first (problematic) version of the script, and it didn't crash on me (using python 3.6 and same versions of packages). I remember having issues with torch import order like this: https://github.com/pytorch/pytorch/issues/2083 but not the memory corruption.
To sum up, I'm not sure I'll be able to help here much, sorry. tensorboard_logger is pure python and is not supposed to do anything nasty, but still I can't explain why this error is happening. This could be some unrelated issue in pytorch or some other C library which is triggered only upon specific conditions, maybe if you obtain the backtrace from the crash this would help to narrow it down.
And thank you for the kind words about the library.
That's correct, the crash happens on the last line, which is the forward pass in the network (I edited the issue for clarity of others.)
I guess I just wanted to document this as an issue so that anyone who comes across similar behavior has one idea to try out.
And if I figure out that combination of factors that causes the problem, I'll comment here.
Thanks @kukuruza let's keep it open so that it's more visible in case someone else also has this problem