rawspeed Improve/cleanup the multi-threading support

The pthreads based multi-threading code should be improved (IMHO). I see two options:

keep the internal 'explicit' interface more or less like it is but switch from pthreads to c++11 threads.
remove the 'explicit' interface, which should simplify the code significantly and switch to OpenMP where applicable.

I would strongly prefer the second option as is would reduce the complexity of the code and thereby make it a lot easier or even make it possible for the first time to add it where it currently is not. E.g. CR2 files could potentially be decoded in parallel when they are composed of multiple strips. I have not looked into this properly, just wanted to bring up the idea and get feedback from others.

If I see it correctly darktable is also embracing the OpenMP methodology. With recent Apple/Clang compilers supporting it as well platform support should be sufficient.

Jan 07 '17 19:01 axxel

I'd prefer to avoid OpenMP for library code. pthreads/etc has been around for a while, and is widely supported. I can not say the sample about OpenMP. clang has only recently (3.8/3.9) gained more-or-less full support for openmp...
I'm somewhat strongly against C++11 threads because they are just too new feature. Not before we hard-depend on C++14. (I remember reading musl people swear about them.)

Jan 07 '17 19:01 LebedevRI

I'd prefer to avoid OpenMP for library code. pthreads/etc has been around for a while, and is widely supported. I can not say the sample about OpenMP. clang has only recently (3.8/3.9) gained more-or-less full support for openmp...

That is true. So replacing the existing code with openmp would potentially make the library less good for some people.

I'm somewhat strongly against C++11 threads because they are just too new feature. Not before we hard-depend on C++14. (I remember reading musl people swear about them.)

That statement surprises me for 3 reasons:

c++11 is 5 years old. saying it is new just does not seem to accurately describe the situation. :)
this argument would be valid for all c++11 features.
you contradict yourself by suggesting to go for c++14, which will always be newer than c++11?

Seriously: claiming that c++11 threads are somehow unreliable or not trustworthy without some sound technical evidence is not very convincing.

But there is also a 3rd approach possible: Start using openmp where there is no multi-threading at all right now and real gain can be achieved. (As it turns out: my understanding of the slices in CR2 files was inadequate. Unfortunately the different slices are still encoded as one big blob of JPEG data. I was hoping they would encode it as individual smaller JPEG blobs which would have allowed to decode them in parallel.) This way we can eventually settle on OpenMP while not breaking the threading features that are currently implemented. I think OpenMP is definitively the better tool for "parallelizing" large, uniform loops like pixel processing stuff than explicit thread object juggling is. So my interest in porting the pthreads code to c++11 threads is very limited. I'd actually consider it a waste of time, now that I thought more about it.

Question: is the phthreads code even working on windows anymore? Now that all the packaged windows stuff has been (rightly so!) removed?

Jan 11 '17 12:01 axxel

That statement surprises me for 3 reasons: 3. you contradict yourself by suggesting to go for c++14, which will always be newer than c++11?

I of course meant "once we deem that c++14 is old-enough, and widespread enough, then we can do -std=c++14". And by that time, c++11 will be older and even more widespread, and hopefully better-supported. Does that sound contradicting even now?

Seriously: claiming that c++11 threads are somehow unreliable or not trustworthy without some sound technical evidence is not very convincing.

True, that was a bit of FUD and "so i heard this somewhere". Right away i have no further proofs.

I think OpenMP is definitively the better tool for "parallelizing" large, uniform loops like pixel processing stuff than explicit thread object juggling is.

Yep, absolutely. That is why all dt modules are using OpenMP.

Question: is the phthreads code even working on windows anymore? Now that all the packaged windows stuff has been (rightly so!) removed?

No clue to be honest, not really my target platform. https://ci.appveyor.com/project/LebedevRI/rawspeed/build/107/job/tenjj8d9ei4uxbyf#L444 claims that the win builds do build with threads, and since the build does not fail, i think it works...

Jan 11 '17 12:01 LebedevRI

Does that sound contradicting even now?

No :).

True, that was a bit of FUD and "so i heard this somewhere". Right away i have no further proofs.

Can we agree then that using std::thread (as well as every other c++11 feature) is generally fine since you decided to require a c++11 compiler in your cmake build files?

Note: I have no plans to introduce std::thread right now.

Yep, absolutely. That is why all dt modules are using OpenMP.

Nice.

No clue to be honest, not really my target platform. https://ci.appveyor.com/project/LebedevRI/rawspeed/build/107/job/tenjj8d9ei4uxbyf#L444 claims that the win builds do build with threads, and since the build does not fail, i think it works...

Ahh, I have not looked at the specifics of the tool chain you set up there but mingw 6.2 looks like a very fine choice to me. This is almost a c++17 compiler ;). I would also assume the pthreads code works in those builds. The question is whether any user (except Klaus) is currently depending on VS and what they have to say about dropping direct build support. (again: I am totally with you on this decision).

Jan 11 '17 13:01 axxel

Ahh, I have not looked at the specifics of the tool chain you set up there but mingw 6.2 looks like a very fine choice to me. This is almost a c++17 compiler ;). I would also assume the pthreads code works in those builds.

I just used whatever is there by default. The current limiting factor would be travis, the most recent linux distro version (which is the only simple way to use coverity, it can not be used from within docker) it provides is trusty, with gcc-4.8,

Can we agree then that using std::thread (as well as every other c++11 feature) is generally fine since you decided to require a c++11 compiler in your cmake build files?

I'm unable to answer that presently. All except Thread support library - yes . One argument against those would be that currently pthread is used. Using both at the same time will be confusing, and a questions like "which one to use in new code" will rise.

The question is whether any user (except Klaus) is currently depending on VS and what they have to say about dropping direct build support.

@LibRaw ?

Jan 11 '17 13:01 LebedevRI

(awake) We do not use threaded RawSpeed. We tried (about 3 years ago) and it looks broken.

BTW, entire move to C++17 looks too early for me, there are a lot of environments where new tools are not available (e.g. WinXP compatible binaries)

Jan 11 '17 13:01 LibRaw

We do not use threaded RawSpeed. We tried (about 3 years ago) and it looks broken.

FWIW in darktable we always use threading in rawspeed, and there has been zero cases where it was broken. Perhaps it was fixed after you tried it last time. Or perhaps it is platform-specific.

BTW, entire move to C++17 looks too early for me, there are a lot of environments where new tools are not available (e.g. WinXP compatible binaries)

There is no move to C++17, not even to C++14.

@LibRaw the question was specifically about dropping direct build support using MSVS.

Jan 11 '17 13:01 LebedevRI

One argument against those would be that currently pthread is used. Using both at the same time will be confusing, and a questions like "which one to use in new code" will rise.

True. But this is no argument against replacing the pthread code ;). And regarding new code: I would definitely argue against introducing even more pthread code. But I don't see the need for consensus on this specific question (new pthread code or not) as I can't see any reason why someone would come up with such a suggestion.

Jan 11 '17 13:01 axxel

For our LibRaw+RawSpeed projects we use qmake builds. VisualStudio 2010 compatibility is critical for this projects (so std::thread is not an option).
Multithreaded RawSpeed: we've experienced troubles only under OS X. Windows was OK.

Jan 11 '17 14:01 LibRaw

VisualStudio 2010 compatibility is critical for this projects (so std::thread is not an option).

Both things suck. I was unable to find a way to use MSVS as compiler from within cmake buildsystem+msys2/mingw64 packages, thus there is even no ci coverage for that 'compiler'.

Jan 11 '17 14:01 LebedevRI

May be, problem is not the compiler, but CI scripts?

Jan 11 '17 14:01 LibRaw

VisualStudio 2010 has only very patchy support for c++11. The decision to use the many improvements provided by c++11 (apart from threads) has been made and long been discussed (back in the old repo).

I spent about 3 weeks full time work on improving the code base after that decision has been made. (Meaning: I will strongly oppose a reversal of that decision). And VS 2010 is simply not going to be able to compile rawspeed from now on. ci system or not. threads or not.

Jan 11 '17 14:01 axxel

Folks, you've asked about our opinion. We use VisualStudio 2010 and we'll use it until we drop WinXP support. We use gcc 4.2 (old XCode) and we'll use it until we drop OS X 10.5 support. Our users still use WinXP (even XP/64) and OS X 10.5/10.6 so in 2017 we do not drop support of these systems.

If this will force us to do not use newer RawSpeed, so be it. May be, we'll backport your improvements back to support older build tools, may be not.

There are a lot of strange build environments, especially in mobile and embedded ecosystems (e.g. print kiosks), it is much better to maintain compatibility with, at least, 6-8 years old hardware and software.

Jan 11 '17 14:01 LibRaw

Of course. Sorry for my 'bite reflex' ;).

Then you will indeed miss out on new rawspeed improvements until you drop support for those two compilers. We (or I) don't intend to change the behaviour or the public api just the internals. So you could set up something that uses the new code where it works for you and uses the old code where it does not without change in your own code.

Outside visible improvements are going to be increased speed (around 10% overall and even 20-30% speedup for NEF files).

Jan 11 '17 14:01 axxel

So be it

Jan 11 '17 14:01 LibRaw

@LibRaw thank you for replying and informing about your expectations!

Jan 11 '17 14:01 LebedevRI

To come back to the original question that @LebedevRI asked: you have your own build system based on qmake and therefore had no use for the VC solution and project files that have been removed by him about two weeks ago, right?

In case you were dependent on the pthread and libjpeg binaries that were in the original repo, you would have to provide them in your own build system from now on.

Jan 11 '17 14:01 axxel

Yes, we do not use VisualStudio projects that coming with RawSpeed, but generate own projects using qmake.

Jan 11 '17 15:01 LibRaw

@LibRaw Thanks for your input from my side as well (and sorry again for my tone earlier - I just spend so much time on this and saw it all go down the drain for a split second ;)).

To summarize:

Adding OpenMP code to RS is fine as it is inherently 'optional' and does not break backward compatibility
Mixing pthread and std::thread code is at least confusing and therefore not advised.
Replacing the pthread code with std::thread does not make too much sense, as OpenMP would be a more suitable replacement anyway. But we don't want potential users of the pthread code loose threading support just because they don't have the tool support.
Replacing the pthread code with OpenMP will be the long term approach.

Any objections to this summary?

Interestingly @LibRaw would have been such a user who would loose pthread support but as the discussion showed they are not using it anyway because they found it to be unstable. I'm just saying ;).

Jan 11 '17 15:01 axxel

We hope to move to new wonderful RawSpeed at some time.

Jan 11 '17 15:01 LibRaw

@LibRaw i have taken one more look at compiling rawspeed with msvs, and even the latest version available on appveyor, VS2015, fails to compile perfectly valid C++.

In particular https://connect.microsoft.com/VisualStudio/feedback/details/811369/local-struct-class-with-in-class-initialization-of-member-causes-compilation-failure.

I'm not sure i want to garble the code just because just some one compiler has rather broken implementation of C++ standard.

On a positive side, i do not expect for rawspeed to switch to C++14 soon, the current minimal compiler requirement is still gcc-4.8. So maybe MSVS will catch up :)

@axxel unfortunately i now have 2 more reasons against openmp:

Apparently, threadsanitizer does not know anything about it. In particular, about critical() section
According to one rawtherapee dev, latest zlib-1.2.11 has some issue, and their code for unpacking deflate dng's, which uses openmp, is now broken. I'm unable to reproduce in rawspeed with zlib-1.2.11

TLDR: openmp is strictly only for tools. At this point, the library code must only use plain pthreads.

Feb 13 '17 17:02 LebedevRI

@axxel

my understanding of the slices in CR2 files was inadequate. Unfortunately the different slices are still encoded as one big blob of JPEG data. I was hoping they would encode it as individual smaller JPEG blobs which would have allowed to decode them in parallel.

Some time ago I tried to optimize decode of cr2 files in RawTherapee because decode of cr2 files was really slow. I managed to transfer some work to a second core. Maybe a similar approach is possible in rawspeed though I don't know anything about the rawspeed cr2 decode code at all. For reference: https://github.com/Beep6581/RawTherapee/blob/dev/rtengine/dcraw.cc#L907

Feb 16 '17 18:02 heckflosse

because decode of cr2 files was really slow

I can only agree with that. I used dcraw in my original design as well. Having pretty much all of my pixel processing (demosaicing, color management, bad pixel correction, etc.) running on the GPU the actual decoding of the raw became 'visible'. That is why I switched to rawspeed, which was substantially faster (I don't remember the exact numbers). Now I managed to squeeze out about another 15-20% for cr2 files. I did some experiments with SSE intrinsics, trying to reduce the 35% front-end-stalls ('idle cpu'), but there was nothing to gain from that so far.

At first, I tried to speed up dcraw as well, mostly by removing ifs that were not relevant for cr2s (I only cared for cr2s). But that got me only so far. The one big advantage of the rawspeed decoder was what used to be called 'bigtable'. The current version can be found here: https://github.com/axxel/rawspeed/blob/develop/src/librawspeed/decompressors/HuffmanTable.h#L232

Anyway, regarding my mentioned idea above about parallelizing the cr2 slice decoding: I did not fully understand at the time what Canon actually does. It turned out, they simply reordered the pixels in the jpeg stream, but it is still one contiguous stream -> bad luck.

Your code seems to pipeline two operations: the actual jpeg decoding and some byte shuffling + curve lookup. The shuffling is unnecessary in rawspeed, because the results are written directly to the final destination in the frame buffer and the curve lookup does simply not happen (in the cr2 case). So I don't see how that could help here, but thanks a lot for your input.

Feb 16 '17 20:02 axxel

@axxel Ok, thanks for your answer. I just wanted to mention it :)

Feb 16 '17 20:02 heckflosse

@parafin re OSX: Have you seen https://gitlab.kitware.com/cmake/cmake/issues/17775 https://gitlab.kitware.com/cmake/cmake/merge_requests/1812 https://iscinumpy.gitlab.io/post/omp-on-high-sierra/ https://github.com/Kitware/CMake/commit/e3cd7c1e014aa48b72d414104b9c1bd6bc03fd64#diff-ae0a179d6813ec707caeada5cab5e63e ? That suggests OpenMP is now working with XCode.

With that, i think one of my main points regarding not using openmp here is //mostly// gone.

Oct 14 '18 13:10 LebedevRI

@LebedevRI, yeah I've seen that XCode now supports OpenMP, but I'm yet to update it because that also requires OS update. Will take a look sometime.

Oct 15 '18 08:10 parafin

Would be really great to get away from requiring CC=gcc-openmp (but keeping CXX=xcode-noopenmp sic).

Oct 15 '18 08:10 LebedevRI

rawspeed rawspeed copied to clipboard

Improve/cleanup the multi-threading support

rawspeed
rawspeed copied to clipboard