libyami Intermittent failed case for VPP of CSC+Sharpness by using yamitranscode on Fedora and ubuntu::yakkety

To use the latest commit on master of yami and libva/intel-driver Test CMD: yamivpp .//1920x1080.nv12 -s 59 ./1920x1080.yv12 yamivpp .//1920x1080.yv12 -s 59 ./1280x720.i420

Jul 20 '17 02:07 FocusLuo

vpp_clips.zip

Jul 20 '17 03:07 FocusLuo

We have setup the fedora 25 env. We are trying to reproduce the issue

Jul 28 '17 00:07 xuguangxin

I have setup fedora 25 env on different APL machines. Building yami uses configure options found in buildlog on website http://media-ci.ostc.intel.com:8810/dashboard. And I run the TEST CMD above for thousands of times. However the issue did not come out. I will try to reproduce the issue with Docker next.

Aug 01 '17 05:08 Zhziyao

What result are you expecting? I don't think this reported issue description tells the whole story.

The actual issue is that the output result from the above test command is not always the same. That is, the md5sum of the output result intermittently changes from run-to-run. The output result is compared via the md5sum output for this test, which changes from run-to-run (i.e. md5sum ./1920x1080.yv12 is not always the same).

I don't know how yamitranscode (mentioned in issue title) has anything to do with this, either.

Aug 01 '17 05:08 uartie

Also, when the md5sum result is not expected I've seen associated GPU Hang on 4.10 and 4.11 kernels:

[23010.721025] drm/i915: Resetting chip after gpu hang
[23010.723370] [drm] RC6 on
[23010.724143] [drm] GuC firmware load skipped

Aug 01 '17 05:08 uartie

I'm able to reproduce at least once every ~200-300 runs sequentially

Aug 01 '17 06:08 uartie

md5sum of ./1920x1080.yv12 output should be f15e2b55a786fcf691f8e9d79e91653d

Aug 01 '17 06:08 uartie

@uartie Thank you for your detailed explanation. And I understand the issue much more clear.

Aug 01 '17 06:08 Zhziyao

@uartie, ziyao used md5 sum to check the command result. It can't reproduce in APL machine, Is it possible it related to CPU step? could you share your CPU step to ziyao in the mail. So he can compare the cpu info.

Aug 01 '17 06:08 xuguangxin

use "lspci -nn |grep VGA"

Aug 01 '17 06:08 xuguangxin

00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5a85] (rev 0b)

Aug 01 '17 06:08 uartie

ok, just checked, we do not have rev 0b.. @uartie , do you have another stepping. We also checked kernel version, we use fedora 25 it is 4.8.6-300.fc25.x86_64, it's not like your kernel version. What os version are you used?

Aug 01 '17 08:08 xuguangxin

@xuguangxin, no I don't have another stepping locally. We use Fedora 25 host with updated kernel (via dnf package manager) and Ubuntu Xenial (16.04) host with updated kernel (via apt package manager).

Please try to update your Fedora 25 packages (including kernel) via dnf update and see if that can reproduce afterwards.

Aug 01 '17 15:08 uartie

Sorry for not explaining my former work clearly.

I updated the kernel to the latest version and ran the test on APL machine.
Besides, I installed the Docker and pulled fedora 25 image from Intel repo. I set up the env with RETOOL.Then I ran the test in the container of fedora 25. However, the issue did not come out under both conditions. I also saved the message of CMD dmesg | grep -i gpu after each loop, but to find no "GPU HANG" message.

Aug 04 '17 02:08 Zhziyao

Seems it's a kind of certain a stepping issue Sadly, U.Artie's stepping higher than Ziyao's Let us find a stepping rev0b

Aug 04 '17 02:08 xuguangxin

Surely, it is a kind of a stepping issue. I can reproduce the Issue on the machine supplied by uartie.

Aug 10 '17 04:08 Zhziyao

Ok, please continue root-causing on the APL I've supplied you.

Aug 10 '17 22:08 uartie

@Zhziyao, @xuguangxin this issue shows up on BSW, too. It's strange APL and BSW would both be caused by stepping issue.

00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:22b1] (rev 21)

Aug 14 '17 15:08 uartie

@Zhziyao , could you find a bsw to reproduce this issue?

Aug 15 '17 03:08 xuguangxin

@Zhziyao , any update on this?

Aug 17 '17 07:08 xuguangxin

I can't reproduce this issue on bsw either. And I just finish setting the test env on another machine. I wonder if there is any difference between uartie's test env and mine, which may probably leads to my failure of reproducing the issue. I will provide my host machine address to uartie on slack for checking.

Aug 17 '17 07:08 Zhziyao

Any progress with identifying/reproducing this issue on your end. I am attaching the i915_error_state generated when GPU hang occurs.

i915_error_state.gz

Feb 01 '18 18:02 uartie

libyami libyami copied to clipboard

Intermittent failed case for VPP of CSC+Sharpness by using yamitranscode on Fedora and ubuntu::yakkety

libyami
libyami copied to clipboard