MLV-App
MLV-App copied to clipboard
Dual iso rendering/export very slow
mlv lite + sound raw dual iso 14bit lossless 720p 25fps 01:38:33 coming from a 5d mkiii on a 1066x 128gb cf intel xeon x5650 2.66ghz (6 cores/12threads), 16gb, evo 850 256gb, w10 x64, gtx1060
~37hours (dual iso on, interpolation AMaZE, alias on, blend on) ~26hours (dual iso on, interpolation AMaZE, alias off, blend off)
~15hours (dual iso on, interpolation mean, alias on, blend on) ~15hours (dual iso on, interpolation mean, alias on, blend oon) + processing (color correction) ~8hours (dual iso on, interpolation mean, alias off, blend off)
~3hours (dual iso off)
all with the same export settings (CinemaDNG uncompressed)
i'm shooting live concerts as a favor for a friend and was toying with magic lantern to maybe get an even better result, but with these processing times not even having it touched in premiere pro, it's a no-go
the app is not using any gpu, and barely my cpu (~8%)
i understand this process takes time but with many cores available it should be possible to assign multiple workers to make it more time-efficient by making better use of available resources
either way, great app :)
Thanks for the issue entry and your tests. 👍 I did not know that this is so slow, because I never used this combination (only for some small tests).
CinemaDNG export runs on one thread only at the moment (@bouncyball-git : am I right, or is it already multithreaded? Could you please check that? Is the AMaZE algorithm the same as for debayering and multithreaded?). All ffmpeg and AVFoundation exports use multithreading and are way faster, but also here, dualIso should be the bottleneck.
The app uses no GPU - that is why it runs on nearly every computer instead of most other video apps. Maybe in far future we will support both - use GPU and don't use GPU as option. But until now we had no real success for using GPU. 😄 If you (or someone else) have skills in programming GPU that would be very welcome!
Again @bouncyball-git : is it possible to run multiple saveDngFrame( m_pMlvObject, cinemaDng, frame, dngFileName.data() )
in parallel? I did that in v0.8 with the PNG export (before ffmpeg started conveting png to movie) using QThreadPool (see function startExport()
and class RenderPngTask
). Could this be a way here, or can just one frame be rendered at once? If I understand the llrawproc code right, it is not really made for multithreading...
If all other components on a 16 thread-CPU-computer are fast enough, that would nearly mean 37hours (see above) divided by 16 = 2:20hours... that would be enormous!
It is not thread safe :(. But we could try do what you did in v0.8.
Would it be possible to get it thread safe with the other amaze code? I am not sure if it is more thread safe when using the Qt solution... (I used a mutex, so I think it is not really better) 😢
Do you run threads like ranges of frames 0-N, N-M, M-X, X-...? I think if initialization of the llrawproc stuff will be done once then saving threads can be used. I'm not very sure about it but this might work :)
No, never tried that... could also be a way.
I found something very interesting what could solve our speed problems in dual iso:
https://en.wikipedia.org/wiki/OpenMP
This parallizes for loops with a single #pragma omp parallel for
command, and for loops we have enough in dual iso.
The problem is, I don't get it to work on OSX, because it needs the llvm compiler installed via brew, which seems not to be compatible with Qt. (at least I did not got it to work)
Edit: on Windows MinGW that really works! DualIso is faster now, but still far away from non dual iso speed... and I damaged it in a way bright areas are strange and program crashes after some pictures 😄
Edit2: have now a change where it works... feels much better in app, but in export it was 12 to 16 seconds... I am disappointed.
Hey man I've been thinking about openmp too, but I had no experience with it so far.
Very nice that you tried it!!! Can I play with your mod under Linux?
Yes, you can try it out. For me linux worked so far...
On mac this could work... but I can't try it out here on my old 10.9 and XCode 6.1 ... works starting with XCode 6.2 brew tells (who knows if that's true). https://stackoverflow.com/questions/44380459/is-openmp-available-in-high-sierra-llvm/47230419#47230419 You can get the compiled libomp in the packages here: http://releases.llvm.org/download.html But I run into big trouble using it...
@bouncyball-git : does it work for you on Linux? How much is the performance improved on your system? Is there any difference? On OSX I gave up for now. openmp comes with a special stdlib.h which is incompabile with the one we use for all variable definitions - so nothing else than the #pragma ...
is working anymore when compiling 😆 😂
I'll try this evening. Had to do lot's of family stuff :). Even bought new refrigerator (BTW it's a German brand ;)).
@masc4ii Hey man I've tested OpenMP version on linux. I can admit it is ~2x faster. But...
Watch this video (1st older version, 2nd newer).
Autsch... looks not good. That happens if you stop playback?
No, it happens randomly during playback. No action from user is done.
Should we leave OpenMP by default? I think it gives about 30% speedup but works incorrectly as discussed above. Maybe get rid of it?
Yapp. Reverted. https://github.com/ilia3101/MLV-App/commit/de1808f55fc6c368ef08ca3729932e25c6ba53c9
Could you please test if it is better now? There are more openmp calls in other files...
I seems now dualiso works w/o issues. pixelproc.c with openmp too. Would be nice to try openmp pragma stuff on stripes and pattern noise and chroma smooth.
Often it is not possible to use the "pragma parallel for" for a for loop. This is the case, if there are 2 parameters in the for header, or if the cycles of the loop depends each other.
Not possible:
for(int i=0, int y=0; i < 5; i++)
{
...
}
And also this does not work, because code can't run parallel.
int i = 0;
for( int y = 0; y < 5; y++)
{
i++;
a[y] = i;
}
There are none of this case as I remember.
In pixelproc.c I tried the pragma stuff. There are some "for" left, but they have this "add_pixel_to_map" - here it is not possible I think, because it could happen at the same time twice (a kind of the second case I wrote above). Here we would need a kind of OMP-Mutex (does it exist?).
Stripes.c: it could work in stripes_apply_correction().
Patternnoise.c: there are a lot of points where it could work.
Chroma_smooth.c could also work... Let's try out!
Try out https://github.com/ilia3101/MLV-App/commit/a031747d5e36a2c80946d6db3fb2d7e2a7988fb3 😊
As I've tested, all been working good so far. Pattern noise sucks as always ;) (slower than even hard dual iso processing) I like the speed of forced bad pixel removal. Stripes also seem to work OK.
Thank you :)
Oh yes! CS 5x5 had good speed I think. 2x2 has barely noticeable performance impact.
What exactly does pattern noise? It looks always worse than without... I really don't know why using it. Yes, CS is faster... what a shame that this does not work on Mac.
If you like: take the dualiso with the pragma's and try out each single one on the reverted file. Maybe you'll find the one doing this stupid error... (but there are many pragma lines).
I've been tracking down the bug which I described in this video and this brought me to the frame caching part as always :wink:. Investigating further I could not determine the exact issue yet... that multy threading rocks and sucks at the same time hahaha :smiley:
What I'm doing now is opening six instances of MLV App, all loading the same session XML file. Then from the first instance I export the first 1/6th of the clips, and on the second the second 1/6th of the clips etc. Takes a bit more time to setup but saves a lot of waiting time :) Works great on Windows, because there it opens another instance by default. On MacOS it will focus on the already opened instance of MLV App.
Hmmm... the "poor mans multithreading". 😄 The problem with dual iso is, that the current algorithm is 100% single threaded. We had no luck at all to change that easily.
I tried with MLVFS today, which converts to dng very quickly. But the Dual ISO results from MLV App are much better, so I'll stick with MLV App for Dual ISO. Faster Dual ISO export from MLV App would be awesome, even if it's just a version of parallel export: "poor mans multithreading" :p On the other hand, the combination of MLVFS for non Dual ISO and MLV App for the Dual ISO clips is not too time consuming either (assuming not too many Dual ISO shots).
Actually I think MLV App is so awesome, I'd love to use it as the only app to process MLV files. So I made a little Automator workflow (MacOS only) to help me with my "poor mans multithreading" 😄. It's used by selecting MLV files in the Finder, right clicking and then under Services choose "MLV App instances". It will evenly spread all the MLV files over as many MLV App instances as the computer has logical cores. For my MacBook, quad core with hyper threading, that means 8 threads and thus 8 MLV App instances. Then I have to manually start the export 8 times. Took exporting 22 MLV files down from 23 minutes to 8 minutes 🙂 As the next step I'd like to look into a post processing script to automatically apply white balance corrections to the exported DNG files with dcraw, like what @dannephoto does with his Switch app.
