libyami
libyami copied to clipboard
Intermittent failed case for VPP of CSC+Sharpness by using yamitranscode on Fedora and ubuntu::yakkety
To use the latest commit on master of yami and libva/intel-driver Test CMD: yamivpp .//1920x1080.nv12 -s 59 ./1920x1080.yv12 yamivpp .//1920x1080.yv12 -s 59 ./1280x720.i420
We have setup the fedora 25 env. We are trying to reproduce the issue
I have setup fedora 25 env on different APL machines. Building yami uses configure options found in buildlog on website http://media-ci.ostc.intel.com:8810/dashboard. And I run the TEST CMD above for thousands of times. However the issue did not come out. I will try to reproduce the issue with Docker next.
What result are you expecting? I don't think this reported issue description tells the whole story.
The actual issue is that the output result from the above test command is not always the same. That is, the md5sum of the output result intermittently changes from run-to-run. The output result is compared via the md5sum output for this test, which changes from run-to-run (i.e. md5sum ./1920x1080.yv12
is not always the same).
I don't know how yamitranscode (mentioned in issue title) has anything to do with this, either.
Also, when the md5sum result is not expected I've seen associated GPU Hang on 4.10 and 4.11 kernels:
[23010.721025] drm/i915: Resetting chip after gpu hang
[23010.723370] [drm] RC6 on
[23010.724143] [drm] GuC firmware load skipped
I'm able to reproduce at least once every ~200-300 runs sequentially
md5sum of ./1920x1080.yv12 output should be f15e2b55a786fcf691f8e9d79e91653d
@uartie Thank you for your detailed explanation. And I understand the issue much more clear.
@uartie, ziyao used md5 sum to check the command result. It can't reproduce in APL machine, Is it possible it related to CPU step? could you share your CPU step to ziyao in the mail. So he can compare the cpu info.
use "lspci -nn |grep VGA"
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:5a85] (rev 0b)
ok, just checked, we do not have rev 0b.. @uartie , do you have another stepping. We also checked kernel version, we use fedora 25 it is 4.8.6-300.fc25.x86_64, it's not like your kernel version. What os version are you used?
@xuguangxin, no I don't have another stepping locally. We use Fedora 25 host with updated kernel (via dnf package manager) and Ubuntu Xenial (16.04) host with updated kernel (via apt package manager).
Please try to update your Fedora 25 packages (including kernel) via dnf update
and see if that can reproduce afterwards.
Sorry for not explaining my former work clearly.
- I updated the kernel to the latest version and ran the test on APL machine.
- Besides, I installed the Docker and pulled fedora 25 image from Intel repo. I set up the env with RETOOL.Then I ran the test in the container of fedora 25.
However, the issue did not come out under both conditions. I also saved the message of CMD
dmesg | grep -i gpu
after each loop, but to find no "GPU HANG" message.
Seems it's a kind of certain a stepping issue Sadly, U.Artie's stepping higher than Ziyao's Let us find a stepping rev0b
Surely, it is a kind of a stepping issue. I can reproduce the Issue on the machine supplied by uartie.
Ok, please continue root-causing on the APL I've supplied you.
@Zhziyao, @xuguangxin this issue shows up on BSW, too. It's strange APL and BSW would both be caused by stepping issue.
00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:22b1] (rev 21)
@Zhziyao , could you find a bsw to reproduce this issue?
@Zhziyao , any update on this?
I can't reproduce this issue on bsw either. And I just finish setting the test env on another machine. I wonder if there is any difference between uartie's test env and mine, which may probably leads to my failure of reproducing the issue. I will provide my host machine address to uartie on slack for checking.
Any progress with identifying/reproducing this issue on your end. I am attaching the i915_error_state generated when GPU hang occurs.