libva icon indicating copy to clipboard operation
libva copied to clipboard

regression found by drm: remove VA_DRM_IsRenderNodeFd() helper

Open XinfengZhang opened this issue 2 years ago • 9 comments

the case is chrome out of process decode case, after this patch 4d4c5f06519fe4c40d6fe1b2ccff812ff3b3b402 it will break the initialization process, suppose it is because the check in drmGetNodeTypeFromFd is strict than if (fstat(fd, &st) == 0) return S_ISCHR(st.st_mode) && (st.st_rdev & 0x80);

ps: int drmGetNodeTypeFromFd(int fd) { struct stat sbuf; int maj, min, type;

if (fstat(fd, &sbuf))
    return -1;

maj = major(sbuf.st_rdev);
min = minor(sbuf.st_rdev);

if (maj != DRM_MAJOR || !S_ISCHR(sbuf.st_mode)) {
    errno = EINVAL;
    return -1;
}

type = drmGetMinorType(min);
if (type == -1)
    errno = ENODEV;
return type;

} @evelikov-work

XinfengZhang avatar Mar 01 '23 13:03 XinfengZhang

suppose I will revert this patch to release 2.17.1 if no other concern. also , I will analyze the issue further , because I could not reproduce the issue in my local side, may need some time to find the rootcause.

XinfengZhang avatar Mar 01 '23 13:03 XinfengZhang

The function has been used by mesa for years, so I don't quite see how it causes a problem.

I'm not claiming it's perfect, but I don't see how. Can you provide some helpful information for me to debug?

evelikov avatar Mar 01 '23 19:03 evelikov

Also note that maj != DRM_MAJOR has been replaced by drmNodeIsDRM() with version 2.4.96, back in 2018 - some 5 years ago. Which is available in Debian "oldstable" (buster) and Ubuntu 18.04 (bionic).

Which distribution is being affected?

evelikov avatar Mar 01 '23 19:03 evelikov

Here are some debug logs i put in libva and got this after revert of the patch. by running in chrome OOPV(out of process video) tests.

Debug code. (I reverted the patch and added manually drmGetNodeTypeFromFd call to check the difference in return of both in this test scenario)

+    int node_type;
 
     if (fd < 0 || (is_render_nodes = VA_DRM_IsRenderNodeFd(fd)) < 0)
         return NULL;

+    printf("libva: vaGetDisplayDRM : VA_DRM_IsRenderNodeFd(%d) : %d\n",fd, is_render_nodes);
+    if (fd < 0 || (node_type = drmGetNodeTypeFromFd(fd)) < 0) {
+    }
+    printf("libva: vaGetDisplayDRM : drmGetNodeTypeFromFd(%d) : %d\n",fd, node_type);

output logs: /var/log/ui/ui.20230302-044921:90:libva: vaGetDisplayDRM : VA_DRM_IsRenderNodeFd(14) : 1 /var/log/ui/ui.20230302-044921:92:libva: vaGetDisplayDRM : drmGetNodeTypeFromFd(14) : -1

both the function returns differently with same fd. VA_DRM_IsRenderNodeFd return success as 1. but drmGetNodeTypeFromFd return -1 for same fd.

kumarsac avatar Mar 02 '23 04:03 kumarsac

@kumarsac are you the original reporter, or you're another person also seeing the issue? Can you help out with a few questions:

  • Distribution name and version - eg Ubuntu 18.04, or custom distribution based on Debian Bookworm
  • libdrm version and origin - eg provided by distro, or other (provide links to build recipe and source code)
  • i915 driver version and origin - eg. provided by distro with kernel v.1.2.3, or other (provide links to build recipe and source code)

In addition to that can you provide the output of:

  • stat /dev/dri/*
  • ls /sys/dev/char/*/device/drm

Thanks o/

evelikov avatar Mar 02 '23 13:03 evelikov

Bear in mind the original VA_DRM_IsRenderNodeFd() is badly broken, since drmGetDeviceNameFromFd returns either NULL or the card node-name... while we assume it might return the render one.

evelikov avatar Mar 02 '23 13:03 evelikov

hi @evelikov , it is a chromeOS usage, libdrm version is new one, suppose @kumarsac could provide the details. but we debugged it yesterday, it is a permission problem OOPV could call fstat with same fd , and return success but could not call stat, the errno is EACCES chrome stack is using sandbox , main gpu process should have all permission , we also print pid uid euid. an found they are same with main process initialization , but still failed. so, it should releated with permission setting in chromeOS.

yes, agree that drmGetDeviceNameFromFd will just return /dev/dri/cardx not render node. it is useless maybe we just need: if (fstat(fd, &st) == 0) return S_ISCHR(st.st_mode) && ((st.st_rdev & 0xff) >> 6) == 2);

XinfengZhang avatar Mar 03 '23 05:03 XinfengZhang

libdrm chrome using is 2.4.110. As @XinfengZhang mentioned its permission issue. (EPERM (-1) when we call stat(path, @sbuf ) in libva case its fstat with fd, (No issue with this).

code:

drmNodeIsDRM(int maj, int min) this api checks the path,
    char path[64];
    struct stat sbuf;

    snprintf(path, sizeof(path), "/sys/dev/char/%d:%d/device/drm",
             maj, min);
    return stat(path, &sbuf) == 0;

kumarsac avatar Mar 03 '23 07:03 kumarsac

When you say chromeOS - which version is that? IIRC there are 3-4 variants available.

Mind you I'm not sandbox expert, so take the following with a pinch of salt.

Chromium applies different sandbox policy for GPU/Decode/Encode

If you follow all the use_XXX_specific_policies references across the codebase - you can deduce where and how, plus there is also per-vendor (AMD/Intel/etc) variance.

From a quick look these should be enabled for both AMD and Intel

Overall the various details are split quite drastically across the tree, so you might need more than just the above.

As a whole, I think most ~~of~~ *if not all the AMD/Intel/Virtio/ARM should be unified (modulo the driver-names), since they all work on the same standard interfaces and come from the same stack.

At the moment Intel is playing catch-up and as you can see it's costing your team.

evelikov avatar Mar 03 '23 14:03 evelikov