mozjpeg icon indicating copy to clipboard operation
mozjpeg copied to clipboard

Memory access broken when grayscaling with tjTransform on windows

Open georg-jung opened this issue 4 years ago • 5 comments

Context: I want to port the .Net package Quamotion.TurboJpegWrapper so that it supports mozjpeg and comes with precompiled binaries. My code is hosted here on github with CI on Azure DevOps.

Problem: While I made some progress I noticed one of the test cases of the project I forked keeps failing. I looked into this and found it quite hard to debug, but I'm somewhat confident that it's a bug in mozjpeg. Failing CI runs in which the problem occured can be found here.

What I do: I call tjTransform in this testcase. Parameters: 4. count = 1; flags = 0; one transform with options = 8 and null/0 for all other parameters. I tried option 2 and 3 for buffer allocation (allocating no buffer/allocating the worst case buffer in my code), both resulting in similar behaviour outlined below. I used this image for testing.

Behaviour:

  • I see different kinds of memory access violations
    • seemingly most common is Bogus virtual array access
    • Virtual array controller messed up happens sometimes
    • DCT coefficient out of range happens too, possibly because wrong data is read? My understanding of how jpeg works is rather limited
    • when running with visual studio as a debugger attached I get errors like 0x00007FF83C430D62 (turbojpeg.dll) in dotnet.exe: 0xC0000005: Zugriffsverletzung beim Lesen an Position 0xFFFFFF89FFFFFFD0. (english: access violation while reading at position). The position that should be read in error (the last address) changes but the addresses seem to be easy to remember ones as the given one, with many repeating digits.
    • ~20% of my runs just work as expected
  • seems to fail just on windows
  • does not fail always, but in my configuration >80% of my local runs and ci runs fail

Why do I think this is a mozjpeg bug?: When replacing turbojpeg.dll with the libjpeg-turbo one I downloaded here the behaviour changes and 100% of the runs work as expected.

georg-jung avatar Feb 18 '20 17:02 georg-jung

Is there anything further I can do to support fixing this?

FYI: My above mentioned MozJPEG wrapper for .Net is live and on nuget now.

georg-jung avatar Mar 12 '20 19:03 georg-jung

Is this an ABI break, or a bug?

Can you test it with a dll you compile yourself? Does it work correctly if you link a static library?

kornelski avatar Mar 13 '20 15:03 kornelski

The dll was compiled using the same vm image in CI as the tests run on afterwards. I don't use precompiled versions on windows or macOS. Complete logs of the native compile and the failed test run can be found here. Same happens when I compile MozJPEG locally using vcpkg and run the tests on my local machine.

Given I'm consuming the library from .Net I don't think I'll be able to link it statically, or do you mean something different?

Thanks in advance

georg-jung avatar Mar 13 '20 15:03 georg-jung

@kornelski Anything more I can do to help finding a solution for this?

georg-jung avatar Mar 27 '20 22:03 georg-jung

Sorry, I didn't find time to look at this.

To me it's still unclear whether it's an ABI incompatibility or a bug, because you're using a DLL. There are lots of crash-inducing things that can be created at a DLL boundary, and I didn't test DLLs.

If you can reproduce the crash without any DLL involved (in plain C if necessary), it would help eliminate this as the cause.

kornelski avatar Mar 30 '20 14:03 kornelski