MediaSDK icon indicating copy to clipboard operation
MediaSDK copied to clipboard

libva h265 decode loops infinitely

Open cloudjiang2 opened this issue 3 years ago • 16 comments

System information

NUC11PAHi5, Ubuntu 20.x mfxlib.log mfxtrace_10535.log

Issue behavior

Describe the current behavior

ffmpeg/qsv hevc decoder loops infinitely under linux ubuntu 20.x,but no problem under windows.

  1. the input source is an rtsp h265 stream
  2. the qsv hevc decoder is selected
  3. loops infinitely in MFXVideoCORE_SyncOperation again and again

Describe the expected behavior

Decode fluently

Debug information

  • the libva/libva-utils/gmmlib/media-driver/Media SDK version all is latest (download from github, compile by myself)
  • the traces are attached: a. textlog trace: mfxlib.log b. trace tool output: mfxtrace_10535.log

cloudjiang2 avatar Dec 16 '21 04:12 cloudjiang2

I deep into the source code, decode one of frame cause the va blocked infinitely: _studio/shared/umc/codec/h265_dec/src/umc_h265_segment_decoder_dxva.cpp sts = dxva_sd->GetPacker()->SyncTask(index, &surfErr); I replaced the SyncTask with va SyncTask2,after timeout, I tried to close the midsession,blocked at vaDestroyContext in _studio/shared/umc/io/umc_va/src/umc_va_linux.cpp

cloudjiang2 avatar Dec 20 '21 01:12 cloudjiang2

Hi @cloudjiang2 ,

Please :

  • attach input stream
  • attach dmesg log
  • try to reproduce the issue with sample_decode

dmitryermilov avatar Dec 20 '21 06:12 dmitryermilov

@dmitryermilov ,

I reproduce the issue with sample_decode:

  1. save the rtsp video stream to test.h265 (see attached testh265.zip, unzip it)
  2. run: sample_decode h265 -I test.h265 -o out.yuv
  3. the mfxtrace is also attached. mfxtrace_41041.log testh265.zip

cloudjiang2 avatar Dec 20 '21 11:12 cloudjiang2

BTW, UMC::AutomaticUMCMutex guard(m_mGuard); in function TaskBroker_H265::GetNextTask is not necessary at file _studio/shared/umc/codec/h265_dec/src/umc_h265_task_broker.cpp, because GetNextTaskInternal will do this.

cloudjiang2 avatar Dec 21 '21 01:12 cloudjiang2

@wangyan-intel , Is this bug confirmed? May I need to submit it to media-driver group?

cloudjiang2 avatar Dec 22 '21 03:12 cloudjiang2

@cloudjiang2 We will check and update with you ASAP.

wangyan-intel avatar Dec 22 '21 03:12 wangyan-intel

@stellawuintel will check and update the status

wangyan-intel avatar Dec 22 '21 07:12 wangyan-intel

@wangyan-intel thanks.

cloudjiang2 avatar Dec 22 '21 10:12 cloudjiang2

@cloudjiang2, I used Tiger Lake Linux/lastest opensource MSDK cannot reproduce the issue. Could you please share about the reproduced driver/MSDK version/platform or env variable if have. And it is stable reproduced or Run2run? Sample_decode_log.txt

Thanks & Regards, Stella

stellawuintel avatar Dec 23 '21 03:12 stellawuintel

@stellawuintel

It can be reproduced stably in my new buy NUC11PAHi5(Xe). There is no problem found in NUC11PAHi3 and 10th gen. The msdk/va/media-driver is compiled by myself, All the code is the latest and downloaded from GitHub. It can be decoded successfully sometimes only I restart the machine and run the first time.

cloudjiang2 avatar Dec 23 '21 10:12 cloudjiang2

Hi @cloudjiang2, I still cannot reproduce error on Tiger lake w/ the last code of libva/driver/msdk and manual build. Could you please provide your kernel version, error log, sample_decode log and dmesg log?

stellawuintel avatar Dec 24 '21 07:12 stellawuintel

@stellawuintel I compiled the va/media-driver/msdk under ubuntu 16.04, and running in ubuntu 20.04. dmesg.log sample_decode.log

  1. uname -a: Linux k-NUC11PAHi5 5.11.0-41-generic # 45~20.04.1-Ubuntu SMP Wed Nov 10 10:20:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  2. dmsg log, see attached

  3. sample_decode log, see attached

cloudjiang2 avatar Dec 27 '21 01:12 cloudjiang2

@stellawuintel Happy new year. Any progress?

cloudjiang2 avatar Jan 05 '22 02:01 cloudjiang2

@cloudjiang2 From log, I did not see any error. From bitstream, I see the 22th - 50th frames are corrupt, is it expected? Have you checked whether the stream is still correct when convert from RTP packet?

Thanks wayne zhu

weizhu-intel avatar Jan 05 '22 03:01 weizhu-intel

@weizhu-intel, The original stream is a rtsp h265 video stream of Hikivision. The problem is that MFXVideoCORE_SyncOperation always return MFX_WRN_IN_EXECUTION when timeout reached. I understand that MFX_WRN_IN_EXECUTION is worth being tried(as FFmpeg do) again and again, then loop occurs. The same stream test.h265 can be decoded successfully with sample_deode.exe in windows.

cloudjiang2 avatar Jan 06 '22 07:01 cloudjiang2

@cloudjiang2 MFXVideoCORE_SyncOperation timeout is only the symptom. Need debug to know what have happened.

A couple of question:

  1. Have you checked whether the stream is correct? Just to make sure no error stream
  2. Is this issue only seen on your machine, is it seen on TGL?
  3. Pls use intel_gpu_top to check engine status.
  4. Pls enable kernel debug log, and capture a new log. command: echo 0xff > /sys/module/drm/parameters/debug

Thanks wayne zhu

weizhu-intel avatar Jan 06 '22 12:01 weizhu-intel