level-zero ZES_ENABLE_SYSMAN vs multiple layers using L0

ZES_ENABLE_SYSMAN vs multiple layers using L0

Open bgoglin opened this issue 3 years ago • 35 comments

Hello

I am porting hwloc to L0 for exposing the locality of GPU devices. Things worked fine with API 0.91 but porting to 1.0 doesn't look good.

I am listing devices with zeDriverGet() and zeDeviceGet() and then calling zesDevicePciGetProperties() and zesDeviceGetProperties() after casting into zes_device_handle_t. As documented in 1.0 doc, this only works if ZES_ENABLE_SYSMAN=1 in the environment. This looks like a terrible idea.

Requiring a third-party library to always modify the environment of a program isn't nice. It's ok when enabling some debugging in a library from time to time, but not when enabling a mandatory feature. I . Passing a flag to zeInit() would be better.
This won't work if multiple layers use L0 at the same time: an application using L0 (most likely a runtime system managing tasks) will call zeInit() without ZES_ENABLE_SYSMAN (because it doesn't need it). It will then call hwloc to get the locality of that some ze devices (to select the local one, etc). hwloc will fail because sysman wasn't enabled in L0 (it will actually segfault in the current implementation). hwloc could set the env var and call its own zeInit() too but it's likely too late anyway. Given that all drivers/devices are shared between all these layers (which is good), the behavior of L0 will basically change depending on which layer calls zeInit() first, that's unpredictable and bad.

0.95 had zetSysmanGet() for converting between ze and zet devices. 1.0 should just do the same: have an explicit function to enable sysman and "cast" a ze device handle to zes. The env var of init flags won't work just like multiple MPI_Init() don't work in MPI.

Aug 21 '20 12:08 bgoglin

I fully agree with your comment and experience a similar issue. Moreover, on Windows using the most recent drivers calling any zes function without having ZES_ENABLE_SYSMAN=1 set will result in a crash in the API due to a null-pointer use.

Sep 21 '20 10:09 malikm

@bgoglin I would like to try out your hwloc with L0 support when you are ready to share.

Sep 21 '20 13:09 rscohn2

@rscohn2 The code is in the levelzero branch of the main repo https://github.com/open-mpi/hwloc/tree/levelzero Tarballs are available at https://ci.inria.fr/hwloc/job/basic/job/levelzero/ It works on my laptop with a single integrated GPU (it creates a zeX object inside my GPU PCI device). Next step is to add information about links between GPUs. Set C_INCLUDE_PATH=/usr/local/include/levelzero and LD_LIBRARY_PATH=/usr/local/lib/ and configure with --enable-levelzero --disable-netloc

Sep 21 '20 14:09 bgoglin

@malikm I have a fix that may resolve the windows issue you described. Currently the loader is loading all drivers at startup. My fix moves the driver load into the zeInit path. This will allow a windows application to do a putenv before calling zeInit and have sysman load correctly. I will post my fix soon.

Sep 21 '20 14:09 bmyates

@bgoglin This has sparked considerable internal discussion over here. Please close this issue - I will discuss this with you offline.

Dec 07 '20 23:12 rhc54

Before the issue is closed, I would like to state that I am following it with attention...

Dec 07 '20 23:12 Kerilk

My apologies - I didn't mean to sound like there was some mysterious or sinister reason behind closing it. The bottom line is that this isn't going to be fixed, but a workaround has been identified. In brief, the architectural intent behind L0 assumed that application processes would not need L0/Sysman support. Sysman pulls in a lot of dependencies and the team wanted L0 to be lightweight.

Thus, the L0 design calls for apps to only use the bare minimum "core" L0 functions, and that only system-level entities would use the broader capabilities that require Sysman. This is not the case for HPC applications that want to see the device capabilities, but it is "baked in" to the L0 architecture.

The workaround is to have the local daemon's HWLOC instance initialize the full L0/Sysman support, and then include the device information in HWLOC's structures that it shares with the local procs (either via HWLOC's shmem support or by passing it down as XML). This gives the application full access to the GPU's capabilities without the app initializing Sysman.

It leaves open the question of an app wanting to access the dynamic information via L0 that also requires Sysman - e.g., current energy usage. For those interfaces, we will create PMIx definitions that will allow the app to obtain the info via the PMIx_Query interface. This interface relays the request to the local daemon, which can then use its Sysman-enabled L0 library to obtain the info and return it to the requestor. We currently do this with a variety of things, so this is a simple extension.

We will be working with @bgoglin and the PMIx community (which I lead anyway) to get these services into place. Meantime, the advice is for applications and libraries that access L0 to not initialize L0 with Sysman. It is only the first call to zeinit that matters, and I know that creates a bit of a race condition as to who makes that call - thus, it is probably good practice for someone to do it early (before initializing other libraries) to ensure it gets done correctly.

This also means that libraries should test to see if Sysman was loaded prior to assuming it is present - I would suggest a "dlsym" check to ensure the Sysman functions are available so you avoid the segfault if not.

Dec 08 '20 00:12 rhc54

Thanks for the detailed answer.

The problem is not limited to the Sysman layer. Any tools in the table here: https://spec.oneapi.com/level-zero/latest/tools/PROG.html would be affected as well. How would those be dealt with?

This also means that libraries should test to see if Sysman was loaded prior to assuming it is present - I would suggest a "dlsym" check to ensure the Sysman functions are available so you avoid the segfault if not.

Given that the loader always export the symbols, I am pretty sure this will not work...

Dec 08 '20 00:12 Kerilk

I should think the app would be safe to access those as they don't involve Sysman, but with the same problem regarding who init's the library first. If an app wants to use those features, I would strongly suggest they immediately call zeinit with the appropriate envar setting.

I'm not sure you are correct about the loader, but this is subject to investigation at this point. I was only trying to suggest other possible avenues for avoiding the segfault as it is unlikely to be fixed.

Dec 08 '20 02:12 rhc54

@Kerilk dlsym will indeed not work, both L0 and Sysman symbols are in the same libze_loader.so. A simple program calling zeDeviceGet() and passing it to zesDeviceGetProperties() will segfault unless ZES_ENABLE_SYSMAN=1. This function should return an error instead of segfaulting. That's likely 2 lines of code, I'll see if I can submit a PR. There is no way to know ZES_ENABLE_SYSMAN=1 was set during the first zeInit().

Dec 08 '20 15:12 bgoglin

Patch above is totally untested since I don't know how to recompile L0 but it seems simple enough to me...

Dec 08 '20 15:12 bgoglin

Why this control is environment variable instead of e.g. an extra flag one can give for zeInit()?

(Extra flag should be backwards compatible change assuming L0 ignores unknown flag bits.)

Mar 10 '21 16:03 eero-t

Complete agree with either zeInit() having a flag for initializing SYSMAN or zesInit() to initialize. Environment variables are intended to change the behavior of code at runtime. In this case, using SYSMAN code without setting ZE_ENABLE_SYSMAN results in code that doesn't work...

Jun 15 '21 20:06 kevin-harms

Something like SDL does would be nice: https://wiki.libsdl.org/SDL_Init

Jun 16 '21 09:06 eero-t

Is there any progress on this? We've been using ugly hacks to define ZES_ENABLE_SYSMAN=1 as early as possible in our library. However every couple months, a new issue is discovered on our side because, as explained above, modifying environment variables within a library is a bad idea. We really need a correct solution on your side, either with zeInit() flags as explained above or anything without environment variable. Thanks.

Jan 29 '22 22:01 bgoglin

Given that it has been nearly two years without Intel fixing this problem, I am inclined to think they never will no matter how many customers and 3rd-party developers complain about it. It therefore seems we have two options:

just refuse to support Intel GPUs. Fortunately, they aren't that pervasive in the world, so this wouldn't enormously impact the user community. Seems a shame to do it, but if we cannot reliably support them, then this might be our best choice.
fork the level-zero repository and fix it ourselves. Maybe call it "level-zero-plus"? Would require that we periodically sync with this repo, but that isn't an overwhelming burden. Negative would be causing some confusion out there, but if we explain why it is necessary, I suspect the packagers would accept the alternative releases.

Obviously, the best solution would be for Intel to fix the bloody library - but it seems that just isn't possible. Of the above options, I'd vote for the first one and simply generate a big warning stating why we aren't supporting Intel GPUs, pointing the user to this ticket. However, if people want to pursue the second, I'm happy to do the fork and occasionally sync it.

Jan 31 '22 18:01 rhc54

Maybe easier to start with a pull request for fixing the issue?

Feb 01 '22 11:02 eero-t

We tried that - didn't go anywhere.

Feb 01 '22 14:02 rhc54

We tried that - didn't go anywhere.

@rhc54 Which PR that is? I do not see it either in open or closed PRs.

Feb 01 '22 14:02 eero-t

hi everyone. So first, thanks for your interest and input here. A few things:

https://spec.oneapi.io/level-zero/0.91/tools/api.html#zetinit

Actually, we had the earlier versions of the API with a specific interface to initialize sysman. see here: https://spec.oneapi.io/level-zero/0.91/tools/api.html#zetinit

Idea was that users would call zetInit to initialize sysman and tools API. This was removed in v1.0, if I remember correctly, because there were suggestions that having an environment variable instead of two initialization functions zeInit and zetInit was better. From reading this post, it seems that was not completely the case :)

Now, we are working on moving some more Sysman queries to Core, so that way users dont need to call Sysman (see here for instance https://spec.oneapi.io/level-zero/latest/core/EXT_PCIProperties.html?highlight=pci#pci-properties-extension). However, that doesn't mean that every Sysman interface would be moved to Core.

So main question here: are we proposing eliminating the SYSMAN env variable because we want to access some specific sysman interfaces? on which case we could just consider maybe moving them to Core to avoid the use of the env var.

Or is the proposal because we are thinking more in the long term for anything that Sysman might provide?

Feb 01 '22 15:02 jandres742

@jandres742 If all non-sensitive queries were available in Core, and Sysman became some sort of "admin" interface for reading sensitive information (not sure if that exists) and configuring hardware/drivers, it would be fine. Right now, on the query side, some things are in Core, some are in Sysman, it's not clear why.

Feb 01 '22 16:02 bgoglin

So main question here: are we proposing eliminating the SYSMAN env variable because we want to access some specific sysman interfaces? on which case we could just consider maybe moving them to Core to avoid the use of the env var.

@jandres742 Different users need different Sysman interfaces, and you cannot know all the use-cases (library vs. app, what data is needed, and with what privileges things get run with) beforehand. Using environment variable to enabled them for anything else than for debugging purposes, is just a bad idea.

What's the problem in supporting init flag? It should(?) be fairly small, backwards compatible API improvement, and the environment can still also be used to enable Sysman.

Feb 01 '22 17:02 eero-t

@bgoglin

Right now, on the query side, some things are in Core, some are in Sysman, it's not clear why.

Right. That's because we have moved to Core only those for which we have found the need. Not sure whether every query on Sysman need to be on Core to be used by Compute applications.

@eero-t

What's the problem in supporting init flag? It should(?) be fairly small, backwards compatible API improvement, and the environment can still also be used to enable Sysman.

it's not a problem, but more of understanding the requirements for all users. As mentioned above, we started to use the environment variable only in 1.0, whereas in previous releases we had a specific interface for that, which aligns with the idea of the flag. So the main issue here I would say is that we are trying to define something that it is applicable for most cases, and that is stable. As you say, using a flag is a way of doing that.

W are discussing internally how we could incorporate the suggestions from this post and provide an update then.

Feb 01 '22 20:02 jandres742

My only comment would somewhat echo that from @eero-t. It feels like you would have to support all the sysman queries as there really is no way to anticipate what people will need - seems like every new programming model needs something more than the last one. Might find yourselves chasing the goal, and eventually wind up simply replicating all of sysman other than the controls. Allowing the user to decide if and when to pull sysman into memory feels like the most cost-effective and customer-friendly solution.

Feb 01 '22 20:02 rhc54

Hi guys - the viewpoint that we are converging on is that L0 sysman is to be used for managing assets (i.e. fan speed, power envelopes, ECC status etcc....) whereas L0 core is to be used for querying system properties typically required for compute as well as actually launching work etc.. So, for example, a query to get the B/D/F address of a device - used for determining which HCA/Ethernet adaptor to use to move a buffer from the device to some remote - is justifiably a compute related query and should belong in L0 core. On the other hand, querying how many fans a device has (even though it is a pure query and not a request to modify some property - is probably not useful to someone doing compute and should probably remain in L0 sysman. Obviously, we won't catch every property on day 1, but what we can do is move the ones that are clearly compute-related (e.g. fabric topology queries etc...) to core and let users file tickets to move the ones that we miss from sysman -> core. How does that sound?

Feb 02 '22 18:02 AstroVPK

@AstroVPK This looks good to me.

FYI, what hwloc currently uses in L0 sysman and would like to have in L0 core is:

zesDeviceGetProperties for vendor name, model name, etc
zesDevicePciGetProperties for BDF and PCI link total bandwidth
zesMemoryGetProperties for physical size, subdevice location and type (type is more precise than in L0 core), no need about individual module info if there are multiple modules with same kind and location
zesMemoryGetState because the doc says sometimes the memory size is valid there but 0 in zesMemoryGetProperties
zesDeviceEnumFabricPorts+zesFabricPortGetProperties+zesFabricPortGetState to find links between tiles and their bandwidth

Feb 02 '22 18:02 bgoglin

@AstroVPK To be perfectly honest, I don't like the direction this is going. A flag at zeInit should be the way to access sysman functions, even if the zeInit call is not the first one. From the user perspective, moving things from sysman to core is just a name change, and is therefore irrelevant unless in deciding which flags to use during zeInit. If you need to use a function prefixed by zes, you need to pass the flag to to enable sysman to zeInit.

I doubt categorizing what "applications" will need will converge: let's say somebody at some point would like to make a wrapper library to better manage fans when running intensive calculations?

Moving things to core might be a solution to avoid initializing sysman in the general case (and avoid paying the overhead if that is what the concern is here), but the option to initialize it at runtime and at will is still important.

Feb 02 '22 18:02 Kerilk

The use case is the need to initialize Sysman. An environment variable didn't work, doesn't work and will never work. The split of interfaces between Core and Sysman is irrelevant. Choosing function names based on which library to initialize also doesn't seem to be a good design metric.

I would encourage the L0 spec source be put on a public GitHub so that these type of changes can be discussed before mistakes are made.

Feb 02 '22 18:02 kevin-harms

We're in agreement to get rid of the environment variable and adding zesInit() to initialize Sysman. We're working on the details and will share the proposal when ready.

Feb 17 '22 22:02 AstroVPK

yes, i think after updating bios from another device we also running again sysman support other wise had disable and stop on services.msc. again and again returnback on environment variable. T try to open registry and open P9Rdr service then modify dword "start" on 2 , delete and take own services mode

May 16 '22 03:05 nahdia99

level-zero level-zero copied to clipboard

ZES_ENABLE_SYSMAN vs multiple layers using L0

level-zero
level-zero copied to clipboard