SVT-HEVC icon indicating copy to clipboard operation
SVT-HEVC copied to clipboard

SvtHevcEncApp hangs and reports overflow and other errors.

Open IvanNablet opened this issue 3 years ago • 15 comments

Encoder crashes randomly whith next command line:

SvtHevcEncApp -hdr 0 -vbv-maxrate 7500032 -vbv-bufsize 9999872 -lad 0 -fps-num 25 -fps-denom 1 -rc 1 -intra-period 24 -bit-depth 10 -w 1920 -h 1080 -tbr 5000064 -i Lion1920x1080_W010_250Frms.raw -b Lion.265

Notes:

  • I downloaded and compiled (MSVC 2019) latest sources on 2021-03-31, but I found the same problem with sources from 2018 also.
  • Craches happens with different video-source files. I tested 1920x1080 and 3840x2160, I420 and W010.
  • I saw these crashes on 6 different computers with Intel`s CPU
  • Messages after crash can be: "divide by zero", "Picture Decision Reorder Queue overflow", or encoder hung-up without messages.
  • I debugged sources and found that mostly problems starts from PictureAnalysisKernel: function EbGetFullObject() return picture object which was released already.

3errors error_I420 error6 error8

IvanNablet avatar Mar 31 '21 10:03 IvanNablet

Hi @IvanNablet , I reproduced with your command line. Thanks for reporting. Will look into the issue.

tianjunwork avatar Mar 31 '21 15:03 tianjunwork

The above command line can be shortened as ./SvtHevcEncApp -lad 0 -rc 1 -w 1920 -h 1080 -i ../../../../yuv/2048x1080_420_5.yuv -b out.bin -n 500

  • Hangs after output 461 frame. Only happens with specific content(100% reproducible).
  • Crash happens very randomly and rarely.

tianjunwork avatar Mar 31 '21 18:03 tianjunwork

Chance of crash/hung-up very dependent of command-line parameters. With my command-line I can see crashes after 30 frames, sometimes. BTW, Tianjun, you used source file 2048x1080 and tried to encode "-w 1920 -h 1080". Output file will be strange.

IvanNablet avatar Apr 06 '21 08:04 IvanNablet

If you want to use the same file with me - my YUV-file is here: https://cloud.nablettechnologies.com/index.php/s/HdAdEx6GfX8DX5g

IvanNablet avatar Apr 06 '21 08:04 IvanNablet

Chance of crash/hung-up very dependent of command-line parameters. With my command-line I can see crashes after 30 frames, sometimes. BTW, Tianjun, you used source file 2048x1080 and tried to encode "-w 1920 -h 1080". Output file will be strange.

This 2k input is to reproduce hang issue, it is 100% reproducible. Yes, the output is garbage. But encoder should be able to process any data.

tianjunwork avatar Apr 06 '21 15:04 tianjunwork

Dear Jun, is this a bug that will be worked on? Currently this is pretty much stops the release of our transcder. Is there any info that you can share with us?

nabletMuzzi avatar Apr 21 '21 12:04 nabletMuzzi

The crash is very random on my side. What is the schedule of your release?

tianjunwork avatar Apr 21 '21 14:04 tianjunwork

We are working on the contribution of the SVT HEVC codec contribution under our own nablet brand. Here is a press release with Lynn Comp, VP Visual Cloud Division at Intel Corp that descibes what we are doing. We have several customers waiting for our SDK but with this bug it is difficult to release. We tried to find the bug ourselves for a couple of weeks, and hope you can help.

nabletMuzzi avatar Apr 21 '21 16:04 nabletMuzzi

Thanks @nabletMuzzi. I debugged the hang issue, which is a regression when yuv 422/444 feature is introduced. It is still under investigation. The crash issue, I only saw it once in hundreds of runs on Ubuntu skx. I need to reproduce it again for further investigation.

tianjunwork avatar Apr 21 '21 18:04 tianjunwork

Hi @nabletMuzzi, @IvanNablet, I am not sure if you evaluated different -lad value to the result bitrate compared with target bitrate. From my experience, use -lad 0 with -rc 1, rate control module couldn't gather good enough information to allocate bits for the output frame, hence the result bitrate is not accurate. For a final product, we strongly recommend disabling -lad 0 when -rc 1 is set in your SDK. It is a bad issue to the end user of getting inaccurate result bitrate by accidentally setting -lad 0. Currently, SVT-HEVC is not designed for real-time pipeline in which encoder can usually perform a decent job with -lad 0. The guidance of setting -lad is in the user guide: When RateControlMode is set to 1 it's best to set this parameter to be equal to the Intra period value (such is the default set by the encoder). When CQP is chosen, then a (2 * minigopsize +1) look ahead is recommended. The bug you reported can only be reproduced by -lad 0, not with default value or -lad 1(which behaves much better in rate control at the same time keeping lower delay, recommend in low e2e latency pipeline). Let me know what you think of it.

tianjunwork avatar Apr 27 '21 05:04 tianjunwork

Thank you, Jun. This workarround works! We will increase look ahead distance.

IvanNablet avatar Apr 27 '21 08:04 IvanNablet

Hi Jun, also thanks from me for your time on this and helping us to understanding the root of the issue.

nabletMuzzi avatar Apr 27 '21 09:04 nabletMuzzi

Hi @IvanNablet @nabletMuzzi , glad to see the workaround works for you. Thank you for reporting the issue. I will mark it as an known issue.

Reproduce steps:

  1. Use the input video downloaded from Nablet ./SvtHevcEncApp -vbv-maxrate 7500032 -vbv-bufsize 9999872 -lad 0 -fps-num 25 -fps-denom 1 -rc 1 -intra-period 24 -bit-depth 10 -w 1920 -h 1080 -tbr 5000064 -i ../../Lion1920x1080_W010_250Frms.yuv -b Lionnohang.265

  2. clean caches makes it easier to reproduce hang/PD overflow echo 1 | sudo tee /proc/sys/vm/drop_caches echo 2 | sudo tee /proc/sys/vm/drop_caches echo 3 | sudo tee /proc/sys/vm/drop_caches

Root cause(for later reference): E.g. below log piece. There are duplicated POC OUT from PD. The POC 26 is missing, which causes hang in IRC kernel down the pipeline. POC 25 PD OUT POC 30 PD OUT ==> should be POC 26 POC 27 PD OUT POC 28 PD OUT POC 29 PD OUT POC 30 PD OUT POC 31 PD OUT

Below is what happened when POC 26 is processed from pictureDecisionReorderQueue. POC 30 PD IN, push: pictureNumber 26, preAssignmentBufferCount 0 POC 30 number is wrong. Its parentPcsobject is messed up. Currently analysis shows data in pictureDecisionReorderQueue object is messed up, which causes hang/overflow.

debug log patch and log: debug.zip

tianjunwork avatar Apr 27 '21 17:04 tianjunwork

It may be best to edit the issue title to be more relevant, so it's easier to find later on when searching for this issue

1480c1 avatar Apr 27 '21 18:04 1480c1

another possible fix is to upgrade SVTHEVC to the HEAD of master branch

inteltiger avatar May 12 '21 02:05 inteltiger