decord icon indicating copy to clipboard operation
decord copied to clipboard

Error with mxnet 1.8.0 dataloader

Open leotac opened this issue 4 years ago • 1 comments

With mxnet 1.8.0.post0, importing decord before mxnet causes an error using a dataloader.

import decord
import mxnet
from mxnet import gluon, image
from mxnet.gluon.data import DataLoader

class ImageDataset(gluon.data.Dataset):
    def __init__(self, paths):
        self._all_paths = paths

    def __len__(self):
        return len(self._all_paths)

    def __getitem__(self, item):
        image_path = self._all_paths[item]
        img = image.imread(image_path, to_rgb=True)
        return img

    def get_dataloader(self, **kwargs):
        return DataLoader(self, **kwargs)

print("decord", decord.__version__)
print("mxnet", mxnet.__version__)
paths = ['0.png', '1.png']
loader = ImageDataset(paths).get_dataloader(batch_size=2)
a, *_ = loader
print(a.shape, a.asnumpy().mean())

That snippet (the files obviously exist and are valid images) yields:

decord 0.5.2
mxnet 1.8.0
libpng error: bad parameters to zlib
libpng error: bad parameters to zlib
Traceback (most recent call last):
  File "repro.py", line 27, in <module>
    print(a.shape, a.asnumpy().mean())
  File "/home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 2566, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "../src/io/image_io.cc", line 160
MXNetError: Check failed: !dst.empty(): Decoding failed. Invalid image file.
[bt] (0) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(dmlc::StackTrace(unsigned long)+0x85) [0x7faeb7f3962c]
[bt] (1) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x36) [0x7faeb7f3991c]
[bt] (2) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::io::ImdecodeImpl(int, bool, void*, unsigned long, mxnet::NDArray*)+0x521) [0x7faeabb5b7a1]
[bt] (3) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x122296e) [0x7faeaba9f96e]
[bt] (4) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x44c) [0x7faeabaa9dbc]
[bt] (5) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xc4) [0x7faeabaabab4]
[bt] (6) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7faeabaa8eab]
[bt] (7) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(+0x166610) [0x7faeb7fa8610]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7faed03ea6db]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7faed011371f]

Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/base.py", line 587, in _notify_shutdown
    check_call(_LIB.MXNotifyShutdown())
  File "/home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "../src/io/image_io.cc", line 160
MXNetError: Check failed: !dst.empty(): Decoding failed. Invalid image file.
[bt] (0) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(dmlc::StackTrace(unsigned long)+0x85) [0x7faeb7f3962c]
[bt] (1) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x36) [0x7faeb7f3991c]
[bt] (2) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::io::ImdecodeImpl(int, bool, void*, unsigned long, mxnet::NDArray*)+0x521) [0x7faeabb5b7a1]
[bt] (3) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x122296e) [0x7faeaba9f96e]
[bt] (4) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x44c) [0x7faeabaa9dbc]
[bt] (5) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>)+0xc4) [0x7faeabaabab4]
[bt] (6) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/mxnet/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x3b) [0x7faeabaa8eab]
[bt] (7) /home/leonardo/miniconda3/envs/mx2/lib/python3.7/site-packages/decord/libdecord.so(+0x166610) [0x7faeb7fa8610]
[bt] (8) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7faed03ea6db]
[bt] (9) /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7faed011371f]

Swap the imports:

import mxnet
from mxnet import gluon, image
from mxnet.gluon.data import DataLoader
import decord

...

and everything works as expected (see below). By the way, with previous versions (I tested mxnet 1.5.1) everything works in both cases (as it should). Different versions of decord don't really seem to matter.

decord 0.5.2
mxnet 1.8.0
(2, 720, 1280, 3) 117.8157691333912

leotac avatar Apr 30 '21 18:04 leotac

"libpng error: bad parameters to zlib libpng error: bad parameters to zlib" my guess is that the static libs shipped by decord/mxnet now conflicts, mxnet is not explicitly requires newer version of libpng/zlib, swapping the order ensures that the latest libs are used. Decord is more tolerent to the versions so there's no issue.

To fully solve this error I believe is either to 1) update the pypi release pipeline to include latest versions of libpng/zlib 2) do not ship binary wheels with static libs
For 2) it's a bit awkward IMO

zhreshold avatar Jun 14 '21 20:06 zhreshold