MLV-App icon indicating copy to clipboard operation
MLV-App copied to clipboard

Dual iso rendering/export very slow

Open gac123 opened this issue 6 years ago • 38 comments

mlv lite + sound raw dual iso 14bit lossless 720p 25fps 01:38:33 coming from a 5d mkiii on a 1066x 128gb cf intel xeon x5650 2.66ghz (6 cores/12threads), 16gb, evo 850 256gb, w10 x64, gtx1060

~37hours (dual iso on, interpolation AMaZE, alias on, blend on) ~26hours (dual iso on, interpolation AMaZE, alias off, blend off)

~15hours (dual iso on, interpolation mean, alias on, blend on) ~15hours (dual iso on, interpolation mean, alias on, blend oon) + processing (color correction) ~8hours (dual iso on, interpolation mean, alias off, blend off)

~3hours (dual iso off)

all with the same export settings (CinemaDNG uncompressed)

i'm shooting live concerts as a favor for a friend and was toying with magic lantern to maybe get an even better result, but with these processing times not even having it touched in premiere pro, it's a no-go

the app is not using any gpu, and barely my cpu (~8%)

i understand this process takes time but with many cores available it should be possible to assign multiple workers to make it more time-efficient by making better use of available resources

either way, great app :)

gac123 avatar Apr 04 '18 17:04 gac123

Thanks for the issue entry and your tests. 👍 I did not know that this is so slow, because I never used this combination (only for some small tests).

CinemaDNG export runs on one thread only at the moment (@bouncyball-git : am I right, or is it already multithreaded? Could you please check that? Is the AMaZE algorithm the same as for debayering and multithreaded?). All ffmpeg and AVFoundation exports use multithreading and are way faster, but also here, dualIso should be the bottleneck.

The app uses no GPU - that is why it runs on nearly every computer instead of most other video apps. Maybe in far future we will support both - use GPU and don't use GPU as option. But until now we had no real success for using GPU. 😄 If you (or someone else) have skills in programming GPU that would be very welcome!

masc4ii avatar Apr 04 '18 18:04 masc4ii

Again @bouncyball-git : is it possible to run multiple saveDngFrame( m_pMlvObject, cinemaDng, frame, dngFileName.data() ) in parallel? I did that in v0.8 with the PNG export (before ffmpeg started conveting png to movie) using QThreadPool (see function startExport() and class RenderPngTask). Could this be a way here, or can just one frame be rendered at once? If I understand the llrawproc code right, it is not really made for multithreading... If all other components on a 16 thread-CPU-computer are fast enough, that would nearly mean 37hours (see above) divided by 16 = 2:20hours... that would be enormous!

masc4ii avatar Apr 04 '18 19:04 masc4ii

It is not thread safe :(. But we could try do what you did in v0.8.

bouncyball-git avatar Apr 09 '18 09:04 bouncyball-git

Would it be possible to get it thread safe with the other amaze code? I am not sure if it is more thread safe when using the Qt solution... (I used a mutex, so I think it is not really better) 😢

masc4ii avatar Apr 09 '18 09:04 masc4ii

Do you run threads like ranges of frames 0-N, N-M, M-X, X-...? I think if initialization of the llrawproc stuff will be done once then saving threads can be used. I'm not very sure about it but this might work :)

bouncyball-git avatar Apr 09 '18 17:04 bouncyball-git

No, never tried that... could also be a way.

masc4ii avatar Apr 10 '18 06:04 masc4ii

I found something very interesting what could solve our speed problems in dual iso: https://en.wikipedia.org/wiki/OpenMP This parallizes for loops with a single #pragma omp parallel for command, and for loops we have enough in dual iso.

The problem is, I don't get it to work on OSX, because it needs the llvm compiler installed via brew, which seems not to be compatible with Qt. (at least I did not got it to work)

Edit: on Windows MinGW that really works! DualIso is faster now, but still far away from non dual iso speed... and I damaged it in a way bright areas are strange and program crashes after some pictures 😄

Edit2: have now a change where it works... feels much better in app, but in export it was 12 to 16 seconds... I am disappointed.

masc4ii avatar Jul 13 '18 08:07 masc4ii

Hey man I've been thinking about openmp too, but I had no experience with it so far.

Very nice that you tried it!!! Can I play with your mod under Linux?

bouncyball-git avatar Jul 13 '18 14:07 bouncyball-git

Yes, you can try it out. For me linux worked so far...

masc4ii avatar Jul 13 '18 14:07 masc4ii

On mac this could work... but I can't try it out here on my old 10.9 and XCode 6.1 ... works starting with XCode 6.2 brew tells (who knows if that's true). https://stackoverflow.com/questions/44380459/is-openmp-available-in-high-sierra-llvm/47230419#47230419 You can get the compiled libomp in the packages here: http://releases.llvm.org/download.html But I run into big trouble using it...

masc4ii avatar Jul 13 '18 19:07 masc4ii

@bouncyball-git : does it work for you on Linux? How much is the performance improved on your system? Is there any difference? On OSX I gave up for now. openmp comes with a special stdlib.h which is incompabile with the one we use for all variable definitions - so nothing else than the #pragma ... is working anymore when compiling 😆 😂

masc4ii avatar Jul 15 '18 10:07 masc4ii

I'll try this evening. Had to do lot's of family stuff :). Even bought new refrigerator (BTW it's a German brand ;)).

bouncyball-git avatar Jul 15 '18 13:07 bouncyball-git

@masc4ii Hey man I've tested OpenMP version on linux. I can admit it is ~2x faster. But...

Watch this video (1st older version, 2nd newer).

bouncyball-git avatar Jul 16 '18 09:07 bouncyball-git

Autsch... looks not good. That happens if you stop playback?

masc4ii avatar Jul 16 '18 15:07 masc4ii

No, it happens randomly during playback. No action from user is done.

bouncyball-git avatar Jul 16 '18 17:07 bouncyball-git

Should we leave OpenMP by default? I think it gives about 30% speedup but works incorrectly as discussed above. Maybe get rid of it?

bouncyball-git avatar Jul 30 '18 13:07 bouncyball-git

Yapp. Reverted. https://github.com/ilia3101/MLV-App/commit/de1808f55fc6c368ef08ca3729932e25c6ba53c9

Could you please test if it is better now? There are more openmp calls in other files...

masc4ii avatar Jul 30 '18 13:07 masc4ii

I seems now dualiso works w/o issues. pixelproc.c with openmp too. Would be nice to try openmp pragma stuff on stripes and pattern noise and chroma smooth.

bouncyball-git avatar Jul 30 '18 18:07 bouncyball-git

Often it is not possible to use the "pragma parallel for" for a for loop. This is the case, if there are 2 parameters in the for header, or if the cycles of the loop depends each other.

Not possible:

for(int i=0, int y=0; i < 5; i++)
{
...
}

And also this does not work, because code can't run parallel.

int i = 0;
for( int y = 0; y < 5; y++)
{
    i++;
    a[y] = i;
}

masc4ii avatar Jul 30 '18 18:07 masc4ii

There are none of this case as I remember.

bouncyball-git avatar Jul 31 '18 12:07 bouncyball-git

In pixelproc.c I tried the pragma stuff. There are some "for" left, but they have this "add_pixel_to_map" - here it is not possible I think, because it could happen at the same time twice (a kind of the second case I wrote above). Here we would need a kind of OMP-Mutex (does it exist?).

Stripes.c: it could work in stripes_apply_correction().

Patternnoise.c: there are a lot of points where it could work.

Chroma_smooth.c could also work... Let's try out!

masc4ii avatar Jul 31 '18 13:07 masc4ii

Try out https://github.com/ilia3101/MLV-App/commit/a031747d5e36a2c80946d6db3fb2d7e2a7988fb3 😊

masc4ii avatar Jul 31 '18 14:07 masc4ii

As I've tested, all been working good so far. Pattern noise sucks as always ;) (slower than even hard dual iso processing) I like the speed of forced bad pixel removal. Stripes also seem to work OK.

Thank you :)

Oh yes! CS 5x5 had good speed I think. 2x2 has barely noticeable performance impact.

bouncyball-git avatar Jul 31 '18 16:07 bouncyball-git

What exactly does pattern noise? It looks always worse than without... I really don't know why using it. Yes, CS is faster... what a shame that this does not work on Mac.

masc4ii avatar Jul 31 '18 17:07 masc4ii

If you like: take the dualiso with the pragma's and try out each single one on the reverted file. Maybe you'll find the one doing this stupid error... (but there are many pragma lines).

masc4ii avatar Jul 31 '18 17:07 masc4ii

I've been tracking down the bug which I described in this video and this brought me to the frame caching part as always :wink:. Investigating further I could not determine the exact issue yet... that multy threading rocks and sucks at the same time hahaha :smiley:

bouncyball-git avatar Aug 10 '18 06:08 bouncyball-git

What I'm doing now is opening six instances of MLV App, all loading the same session XML file. Then from the first instance I export the first 1/6th of the clips, and on the second the second 1/6th of the clips etc. Takes a bit more time to setup but saves a lot of waiting time :) Works great on Windows, because there it opens another instance by default. On MacOS it will focus on the already opened instance of MLV App.

Jip-Hop avatar Aug 12 '20 16:08 Jip-Hop

Hmmm... the "poor mans multithreading". 😄 The problem with dual iso is, that the current algorithm is 100% single threaded. We had no luck at all to change that easily.

masc4ii avatar Aug 12 '20 16:08 masc4ii

I tried with MLVFS today, which converts to dng very quickly. But the Dual ISO results from MLV App are much better, so I'll stick with MLV App for Dual ISO. Faster Dual ISO export from MLV App would be awesome, even if it's just a version of parallel export: "poor mans multithreading" :p On the other hand, the combination of MLVFS for non Dual ISO and MLV App for the Dual ISO clips is not too time consuming either (assuming not too many Dual ISO shots).

Jip-Hop avatar Aug 13 '20 18:08 Jip-Hop

Actually I think MLV App is so awesome, I'd love to use it as the only app to process MLV files. So I made a little Automator workflow (MacOS only) to help me with my "poor mans multithreading" 😄. It's used by selecting MLV files in the Finder, right clicking and then under Services choose "MLV App instances". It will evenly spread all the MLV files over as many MLV App instances as the computer has logical cores. For my MacBook, quad core with hyper threading, that means 8 threads and thus 8 MLV App instances. Then I have to manually start the export 8 times. Took exporting 22 MLV files down from 23 minutes to 8 minutes 🙂 As the next step I'd like to look into a post processing script to automatically apply white balance corrections to the exported DNG files with dcraw, like what @dannephoto does with his Switch app.

Screenshot 2020-08-16 at 01 10 33

MLV App instances.workflow.zip

Jip-Hop avatar Aug 16 '20 00:08 Jip-Hop