openh264 icon indicating copy to clipboard operation
openh264 copied to clipboard

Why does DecodeFrameNoDelay take ~100ms per frame?

Open tottaka opened this issue 2 years ago • 20 comments

Hi, is there a way to speed up the decoder or can someone explain why decoding a single 1920 x 1080 frame is taking 100ms? Is this normal? Since there is mention of real-time applications such as WebRTC, I'm assuming I must be doing something wrong. I've compiled the current master branch on win64 using the AutoBuildForWindows.bat method

tottaka avatar Mar 17 '23 21:03 tottaka

For reference, here is how I am using the method including calculating how long decoding takes:

unsigned char* buffer[3];

SBufferInfo bufInfo; memset(&bufInfo, 0, sizeof(bufInfo));

auto start = high_resolution_clock::now(); DECODING_STATE rc = decoder->DecodeFrameNoDelay(pSrc, iSrcLen, buffer, &bufInfo);

auto stop = high_resolution_clock::now(); auto duration = duration_cast(stop - start); long decodeMs = duration.count() / 1000;

printf("Decode time: %d milliseconds.\n", decodeMs);

tottaka avatar Mar 17 '23 21:03 tottaka

I got the decoder running now at 40 - 50 milliseconds per frame after turning down the encoder quality level and bitrate which is much more acceptable

tottaka avatar Mar 17 '23 23:03 tottaka

Hii @tottaka

Can you tell me how did you turn down encoder quality level and bitrate , is there any flag or where should i reduce the bitrate level.

mohammedzakikochargi avatar May 03 '23 07:05 mohammedzakikochargi

Hii @tottaka

Can you tell me how did you turn down encoder quality level and bitrate , is there any flag or where should i reduce the bitrate level.

I'm not using Cisco's OpenH264 implementation for encoding - I'm using Broadcom's MMAL interface (Raspberry Pi). It seems like you can define encoding bitrate on Cisco's OpenH264 implementation in the SEncParamBase struct that is used to setup/initialize the encoder, via the iTargetBitrate property.

As for 'quality level' I can't help you much. I can only assume it's one of the RC_MODES enums that you will also need to supply to the SEncParamBase struct iRCMode property upon setup/initialization. This may be wrong, you can try.

As I said, I've never used the encode function of this library.

Hope this helps :)

tottaka avatar May 04 '23 05:05 tottaka

From my testing, this software codec simply isn't capable of real-time, live video such as WebRTC as advertised. It's just too slow. I have switched to using pure GPU-based hardware accelerated decoding.

tottaka avatar May 04 '23 06:05 tottaka

@tottaka what's your PC's performance and how did you build the decoder? I tried on MacPro 2018, and the decoding speed is <10ms.

huili2 avatar May 04 '23 06:05 huili2

@tottaka what's your PC's performance and how did you build the decoder? I tried on MacPro 2018, and the decoding speed is <10ms.

Intel i7 4790K (Devil's Canyon) Built with AutoBuildForWindows.bat (Win64-C-Only option). I suspected something was wrong, but no one ever responded to this issue where I tried asking what I'm doing wrong... If I could get ~10ms decode time, that would be great. It seems to drastically depend on the encoded stream for me. I gave up with this codec while trying to decode 1920x1080/15fps @ 10,000,000 bitrate. I believe these are the settings I was using here

tottaka avatar May 04 '23 06:05 tottaka

Oh, you are using "C-Only" and surely the performance is really bad. Suggest you use "Win64-ASM" instead of "Win64-C-Only".

huili2 avatar May 04 '23 06:05 huili2

Oh, you are using "C-Only" and surely the performance is really bad. Suggest you use "Win64-ASM" instead of "Win64-C-Only".

I will try in the morning. Is it really such big difference? If this works, I will update. Thank you!

tottaka avatar May 04 '23 06:05 tottaka

Sure. Hope to get good news then!

huili2 avatar May 04 '23 06:05 huili2

Oh, you are using "C-Only" and surely the performance is really bad. Suggest you use "Win64-ASM" instead of "Win64-C-Only".

I follow the steps of meson build , is it the same as Win64-ASM or Win64-C-Only?

mohammedzakikochargi avatar May 04 '23 11:05 mohammedzakikochargi

Sure. Hope to get good news then!

I seem to have gained about 10ms performance increase per frame decode using the ASM version. As mentioned here it went from 40 - 50ms per frame to now around 30 - 40ms per frame. Is there any other ways that you can think of to increase performance even further?

EDIT: Scratch that, I forgot in my decode function I am also doing RGB conversion which seems to add an additional ~14ms per decode. I've measured the actual DecodeFrameNoDelay method time which seems to take anywhere between 10 - 20ms per decode. This is much more acceptable, thank you for suggesting to use the ASM version instead!

tottaka avatar May 04 '23 23:05 tottaka

Oh, you are using "C-Only" and surely the performance is really bad. Suggest you use "Win64-ASM" instead of "Win64-C-Only".

I follow the steps of meson build , is it the same as Win64-ASM or Win64-C-Only?

If no specific options are added, ASM should be enabled by default.

huili2 avatar May 05 '23 01:05 huili2

Hii @huili2 While using DecodeParser function with a bitstream of resolution of 1920x1080, i see that the latency remains around 1 to 2 milliseconds but for an I or keyFrame the latency goes up to 30 milliseconds. I know that for this API call the bParseOnly mode is true so it ultimately skips frame reconstruction part , so what is that the latency is higher for an I frame . Is there any way that i can lower down the latency for a I Frame . FYI I am using this for getting motion vectors without the frame reconstruction happening

mohammedzakikochargi avatar May 05 '23 14:05 mohammedzakikochargi

@mohammedzakikochargi we did not pay too much attention on this viewpoint. As you said, it skips reconstruction in parse mode, so ideally it is logically the same as P frames, so is it purely caused by the entropy decoding of IDR frames? You may make some research on this :)

huili2 avatar May 06 '23 01:05 huili2

@mohammedzakikochargi we did not pay too much attention on this viewpoint. As you said, it skips reconstruction in parse mode, so ideally it is logically the same as P frames, so is it purely caused by the entropy decoding of IDR frames? You may make some research on this :)

Unrelated, but by any chance are you familiar with using OpenGL? I'm currently trying to offload the decoded YUV420p data to an OpenGL fragment shader to display on a full-screen quad. Have you done anything like this? I've tried multiple shader examples I found online, but I can't understand what is going wrong. No worries if you can't help, I just thought I'd ask. Also, if you'd like to close this issue it's okay since the performance issue is now resolved. Thank you again for helping me with the initial issue :)

tottaka avatar May 06 '23 02:05 tottaka

@mohammedzakikochargi we did not pay too much attention on this viewpoint. As you said, it skips reconstruction in parse mode, so ideally it is logically the same as P frames, so is it purely caused by the entropy decoding of IDR frames? You may make some research on this :)

Any insight that you can provide me to specifically check on the code?

mohammedzakikochargi avatar May 06 '23 18:05 mohammedzakikochargi

@mohammedzakikochargi we did not pay too much attention on this viewpoint. As you said, it skips reconstruction in parse mode, so ideally it is logically the same as P frames, so is it purely caused by the entropy decoding of IDR frames? You may make some research on this :)

Any insight that you can provide me to specifically check on the code?

Hii @huili2 I went through the code, based on your comment i saw the EntropyCoding flag when enabled is calling ParseIntraPredModeLumaCabac and ParseIntraPredModechromaCabac. Which i also observed is the cause for latency. But I think parsing of chroma and luma cabac is not required when the parseOnly mode is true because we are skipping frame reconstruction in that case. I just tried by skipping these function calls , but unfortunately it does not work that way. Can you please suggest me way , so I can get red of this parsing of chroma and luma during parserOnly mode.

Any help will be appreciated . Thank you!!

mohammedzakikochargi avatar May 12 '23 07:05 mohammedzakikochargi

@mohammedzakikochargi currently ParseOnly mode is designed for checking the bitstream correctness, so it contains the deepest parsing of each syntax. I think that's not the requirement of your application. There's no easy way to skip these parsing, so you have to change each related function one by one afterwards. That's also what I need do on this pathway.

huili2 avatar May 12 '23 08:05 huili2

@tottaka There should be many successful examples for OpenGL usage online to simply render the YUVI420 input data. I'm not able to provide this here :)

huili2 avatar May 12 '23 08:05 huili2