level-zero icon indicating copy to clipboard operation
level-zero copied to clipboard

Relaxed Allocation Limit in Level Zero

Open jjfumero opened this issue 2 years ago • 9 comments

When playing around with the ZE_RELAXED_ALLOCATION_LIMITS_EXP_FLAG_MAX_SIZE flag for buffer allocation I noticed the following:

If I request a buffer to be allocated with a size larger than my system allows (in my case 26GB), I get an error with 0x78000009 (size argument is not supported by the device ). Which is expected.

For context, this is the output of the device memory properties of my system:

stype : DEVICE_MEMORY_PROPERTIES
pNext : 0x0
flags : Device::{ ? }
maxClockRate : 0      <-- Not sure why this value is 0 
maxBusWidth : 64
totalSize : 26706980864   <<- ~ 26GB
name : DDR

However, I am able to execute the allocate functions (e.g., zeMemAllocDevice) with for example, 3 buffers of 20 GB each (in total is using 60GB in global memory, which I should not be allowed to do this), So, each alloc call is requesting a buffer size smaller than the maximum global memory available but combined, it is much larger. But instead of getting an error code, I get directly a crash.

You can reproduce this using this sample code: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp

Is this behaviour expected? Or have you considered/ is there anything in the Level Zero API similar to this call?

bool canBeAllocated = zeCanDeviceBufferSizeBeAllocated(context, deviceDesc, alignment, device, &buffer);

So a function that we can query for available space for a given buffer before the actual allocation.

Hardware/ Software details:

  • Intel Driver: 21.38.21026
  • Total RAM: 32 GB
  • OpenCL 3.0
  • GPU: Intel HD Graphics from i9-10885H

jjfumero avatar Apr 06 '22 13:04 jjfumero

Hi @jjfumero

So this is more of implementation specific detail and it depends on how the driver stack works. The implementation you are using here is the Intel L0 driver, and that SW stack uses basically lazy allocation or residency of allocations.

This works as this: you can allocate several allocations, as long as each allocation is up to the maximum allocatable size. Now, the reason you are able to allocate several which in total is larger than the device total memory is because those allocations are made resident in the device only when needed. That is, you could have N allocations, but your workload might need only one at time when executing in the device. So actually, the device memory doesn't need to hold all simultaneously, which is why your allocations succeed.

Now, if you have a kernel that actually needs all those allocations, then when submitting that kernels, the driver would try to make resident all of them, and expectedly, submission would fail, as there's no space to make all of them resident. In this case, zeCommandQueueExecuteCommandLists may return OUT_OF_MEMORY error.

jandres742 avatar Apr 06 '22 16:04 jandres742

@jandres742 thank you for the clarification. So unless the allocated buffers are required by the kernel being executed, they are not actually allocated. But does this happen for shared buffers, device buffers and host buffers in Level Zero?

I understand the lazy allocation might happen for device buffers, but I don't see why the other types of buffers should be lazily allocated. Also, I get a crash during the buffer allocation as shown in this example:

ze_result_t result;
void *sharedBuffer = nullptr;

hostDesc.pNext = &exceedCapacity;
memAllocDesc.pNext = &exceedCapacity;

std::cout << "Allocating Shared: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocShared(context, &memAllocDesc, &hostDesc, allocSize, 128, device, &sharedBuffer);
if (result == 0x78000009) {
     std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
    std::cout << "\tAlloc OK" << std::endl;
}

void *deviceBuffer = nullptr;
std::cout << "Allocating On Device: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocDevice(context, &memAllocDesc, allocSize, 64, device, &deviceBuffer);
if (result == 0x78000009) {
    std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
    std::cout << "\tAlloc OK" << std::endl;
}

void *hostBuffer = nullptr;
std::cout << "Allocating from Host " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
result = zeMemAllocHost(context, &hostDesc, allocSize, 64, &hostBuffer);
if (result == 0x78000009) {
    std::cout << "size argument is not supported by the device \n";
} else if (result == ZE_RESULT_SUCCESS) {
    std::cout << "\tAlloc OK" << std::endl;
}

In my case, I can allocate global memory buffers of up to 26GB. If I run this and I allocate 20GB per buffer, I get a crash during the allocation using the zeMemAllocHost function (3rd alloc function).

What I take from here is that,

  1. zeMemAllocDevice is lazily allocated.
  2. zeMemAllocHost and zeMemAllocShared are directly allocated (blocking calls) and directly accessible from the host.

If this is the case, is it expected to get a crash during the execution of the zeMemAllocHost function? or should we get an exception or an error code with an alloc failure?

You can reproduce this using this error using this code: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp

jjfumero avatar Apr 07 '22 08:04 jjfumero

So unless the allocated buffers are required by the kernel being executed, they are not actually allocated. But does this happen for shared buffers, device buffers and host buffers in Level Zero

More than not being allocated, it is that they are guarantee to be available by the time the GPU kernel executes. It could be allocated at any time between allocation and kernel execution, there's no exact point. The only guarantee is that they will be ready by the time of execution.

All allocations in L0 driver go through KMD, so all share similar behavior, which is what you might be seeing.

If this is the case, is it expected to get a crash

what crash you get?

jandres742 avatar Apr 07 '22 14:04 jandres742

what crash you get?

I am not sure what to report. The Linux terminal I run on to execute this program suddenly closes along with subprocesses that I have been running through this terminal. Also dmesg does not seem to report anything related to the crash. Just the terminal window is suddenly closed. Is there any way to report this type of crash?

jjfumero avatar Apr 07 '22 14:04 jjfumero

@jjfumero are you still seeing the crash?

jandres742 avatar Aug 25 '22 07:08 jandres742

Hi @jandres742 . Sorry for the delay. I just checked with the latest driver (22.35.24055) on Ubuntu and still I get the crash with no warnings/errors when I allocate more than I should. I am not sure if this is the expected behaviour, meaning that, should the develop controls the remaining memory space? or the Level Zero implementation can control this and throw an exception?

To reproduce it, I am still using the program I sent https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp

./levelZeroAlloc 200000000000

The problem about this test in my case it that all applications that are running/using the iGPU are suddenly stop and closed.

jjfumero avatar Sep 15 '22 13:09 jjfumero

@jjfumero could you confirm whether you are seeing the issue with latest drivers?

jandres742 avatar Mar 14 '23 16:03 jandres742

Hi @jandres742 , I am not using the latest drivers. I will update, and let you know.

jjfumero avatar Mar 14 '23 17:03 jjfumero

@jandres742 , I confirm this issue is gone with the latest driver : https://github.com/intel/compute-runtime/releases/tag/22.53.25242.13

To reproduce it: https://github.com/jjfumero/codeBlogArticles/blob/master/april2022/levelZeroAlloc/levelZeroAlloc.cpp

> ./levelZeroAlloc 30000000000
Device   : Intel(R) UHD Graphics 770 [0x4680]
Type     : GPU
Vendor ID: 8086
#Queue Groups: 1
Allocating Shared Memory: 30000000000 bytes - 30 (GB) 
size argument is not supported by the device 
Allocating Device Memory: 30000000000 bytes - 30 (GB) 
size argument is not supported by the device 
Allocating Host Memory: 30000000000 bytes - 30 (GB) 
	Alloc OK
    std::cout << "Allocating Shared Memory: " << allocSize << " bytes - " << (allocSize * 1e-9 ) << " (GB) " << std::endl;
    result = zeMemAllocShared(context, &memAllocDesc, &hostDesc, allocSize, 128, device, &sharedBuffer);
    if (result == 0x78000009) {
         std::cout << "size argument is not supported by the device \n";
    } else if (result == ZE_RESULT_SUCCESS) {
        std::cout << "\tAlloc OK" << std::endl;
    }  

Thanks

jjfumero avatar Mar 16 '23 09:03 jjfumero