VulkanMemoryAllocator icon indicating copy to clipboard operation
VulkanMemoryAllocator copied to clipboard

Problem when running tests of sample application on NVIDIA RTX 3090

Open IAmNotHanni opened this issue 1 year ago • 5 comments

Hi

As mentioned in the update here, I am stuck in a loop when running the tests of the sample application on the NVidia RTX 3090. I created this issue separately, as I believe it has nothing to do with the issue on AMD Ryzen™ 9 7950X.

TESTING:
Test JSON
Saving JSON dump to file "JSON_VULKAN.json"
Test basics
Test vnaGetAllocatorInfo
Test virtual blocks
Test virtual blocks algorithms
Benchmark virtual blocks algorithms
Alignment,Algorithm,Strategy,Alloc time ms,Random operation time ms,Free time ms
1,TLSF,Default,0.8683,1.9644,0.6968
1,Linear,Default,0.4501,1.9759,1.0393
1,TLSF,MIN_MEMORY,0.5542,1.8697,0.6747
1,Linear,MIN_MEMORY,0.4277,1.9545,1.0324
1,TLSF,MIN_TIME,0.4838,1.7939,0.7068
1,Linear,MIN_TIME,0.5023,1.9511,1.0267
16,TLSF,Default,1.475,2.3691,0.7475
16,Linear,Default,0.44,1.9559,1.0447
16,TLSF,MIN_MEMORY,1.8215,3.5448,0.7413
16,Linear,MIN_MEMORY,0.4307,1.9208,1.0501
16,TLSF,MIN_TIME,0.7536,2.287,0.7677
16,Linear,MIN_TIME,0.4334,1.9472,1.0469
64,TLSF,Default,1.553,3.9169,0.7531
64,Linear,Default,0.4387,1.9565,1.0517
64,TLSF,MIN_MEMORY,1.8469,4.536,0.8009
64,Linear,MIN_MEMORY,0.433,1.951,1.0555
64,TLSF,MIN_TIME,0.7629,2.2192,0.7585
64,Linear,MIN_TIME,0.4442,1.9544,1.0469
256,TLSF,Default,1.8158,4.5182,0.7826
256,Linear,Default,0.4409,1.9617,1.0433
256,TLSF,MIN_MEMORY,1.771,4.434,0.7337
256,Linear,MIN_MEMORY,0.4375,1.9494,1.0511
256,TLSF,MIN_TIME,0.7781,2.1551,0.7544
256,Linear,MIN_TIME,0.4368,1.9581,1.0577
Test allocation versus resource size
Test Pool MinBlockCount
Test Pool MinAllocationAlignment
Test pools and allocation parameters
Test heap size limit
Testing memory usage:
  VMA_MEMORY_USAGE_UNKNOWN:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=0
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=0
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=0
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=0
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=0
  VMA_MEMORY_USAGE_GPU_ONLY:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=1
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=1
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=1
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=1
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=1
  VMA_MEMORY_USAGE_CPU_ONLY:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=3
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=3
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
  VMA_MEMORY_USAGE_CPU_TO_GPU:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=5
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=5
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
  VMA_MEMORY_USAGE_GPU_TO_CPU:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=4
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=4
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
  VMA_MEMORY_USAGE_CPU_COPY:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=0
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=0
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=0
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=0
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=0
  VMA_MEMORY_USAGE_GPU_LAZILY_ALLOCATED:
    Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, FAILED with res=-8
    Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, FAILED with res=-8
    Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
    Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
Testing statistics...
Testing aliasing...
  size: max(1399808, 8847360) = 8847360
  alignment: max(1024, 1024) = 1024
  memoryTypeBits: 3 & 3 = 3
Testing allocation aliasing...
Testing mapping...
Testing allocation-memory copy...
Test mapping hysteresis
Test VK_KHR_maintenance5
Testing mapping multithreaded...
Test linear allocator
Manually test linear allocator
Test linear allocator multi block
Test allocation algorithm correctness
Basic test TLSF
Basic test allocate pages
Test buffer device address
Test memory priority
Benchmark algorithms
    Algorithm=TLSF Empty Allocation=MIN_MEMORY FreeOrder=BACKWARD: allocations 0.0304009 s, free 0.0090137 s
    Algorithm=TLSF Empty Allocation=MIN_TIME FreeOrder=BACKWARD: allocations 0.0139352 s, free 0.0089325 s
    Algorithm=Linear Empty Allocation=Default FreeOrder=BACKWARD: allocations 0.011355 s, free 0.0084895 s
    Algorithm=TLSF Not empty Allocation=MIN_MEMORY FreeOrder=BACKWARD: allocations 0.0486588 s, free 0.0090518 s
    Algorithm=TLSF Not empty Allocation=MIN_TIME FreeOrder=BACKWARD: allocations 0.0158069 s, free 0.0106145 s
    Algorithm=Linear Not empty Allocation=Default FreeOrder=BACKWARD: allocations 0.0113662 s, free 0.0086673 s
    Algorithm=TLSF Empty Allocation=MIN_MEMORY FreeOrder=FORWARD: allocations 0.0297251 s, free 0.0101225 s
    Algorithm=TLSF Empty Allocation=MIN_TIME FreeOrder=FORWARD: allocations 0.0139524 s, free 0.0099312 s
    Algorithm=Linear Empty Allocation=Default FreeOrder=FORWARD: allocations 0.011273 s, free 0.008372 s
    Algorithm=TLSF Not empty Allocation=MIN_MEMORY FreeOrder=FORWARD: allocations 0.0494525 s, free 0.0103591 s
    Algorithm=TLSF Not empty Allocation=MIN_TIME FreeOrder=FORWARD: allocations 0.0158748 s, free 0.010693 s
    Algorithm=Linear Not empty Allocation=Default FreeOrder=FORWARD: allocations 0.0113684 s, free 0.0108735 s
Test defragmentation simple
  Persistently mapped option = 0
  Persistently mapped option = 1
Test defragmentation vs mapping
    Pass 0 moving 31 allocations
    Pass 1 moving 6 allocations
    Defragmentation: moved 31 allocations, 2031616 B, freed 5 memory blocks, 5242880 B
Test defragmentation simple
  Algorithm = Fast
VUID-vkBindImageMemory-memory-01047 ║ Validation Error: [ VUID-vkBindImageMemory-memory-01047 ] Object 0: handle = 
0xec3f770000002066, type = VK_OBJECT_TYPE_DEVICE_MEMORY; | MessageID = 0xa316549f | vkBindImageMemory(): image require 
memoryTypeBits (0x3) but VkDeviceMemory 0xec3f770000002066[] was allocated with memoryTypeIndex (4). The Vulkan spec states: 
memory must have been allocated using one of the memory types allowed in the memoryTypeBits member of the 
VkMemoryRequirements structure returned from a call to vkGetImageMemoryRequirements with image (https://vulkan.lunarg.com/doc/
view/1.3.275.0/windows/1.3-extensions/vkspec.html#VUID-vkBindImageMemory-memory-01047)

IAmNotHanni avatar Jun 18 '24 20:06 IAmNotHanni

I attempted to do a git bisect, but I could not identify when the problem was introduced. If anyone has an idea where to start the bisect let me know.

IAmNotHanni avatar Jun 18 '24 20:06 IAmNotHanni

The validation error comes from AllocInfo::CreateImage when it's being called by TestDefragmentationAlgorithms.

IAmNotHanni avatar Jul 10 '24 21:07 IAmNotHanni

It looks like the issue is deep in vmaCreateImage itself...

IAmNotHanni avatar Jul 10 '24 21:07 IAmNotHanni

Inside of VkResult VmaAllocator_T::AllocateMemory, the vkMemReq.memoryTypeBits is not passed on into AllocateMemoryOfType?

if(createInfoFinal.pool != VK_NULL_HANDLE)
{
    VmaBlockVector& blockVector = createInfoFinal.pool->m_BlockVector;
    return AllocateMemoryOfType(
        createInfoFinal.pool,
        vkMemReq.size,
        vkMemReq.alignment,
        prefersDedicatedAllocation,
        dedicatedBuffer,
        dedicatedImage,
        dedicatedBufferImageUsage,
        createInfoFinal,
        blockVector.GetMemoryTypeIndex(),
        suballocType,
        createInfoFinal.pool->m_DedicatedAllocations,
        blockVector,
        allocationCount,
        pAllocations);
}

IAmNotHanni avatar Jul 10 '24 21:07 IAmNotHanni

Inside function VmaAllocator_T::AllocateMemory:

  • In case when createInfoFinal.pool == VK_NULL_HANDLE, which means default pools are used, vkMemReq.memoryTypeBits is used to find the preferred memory type in a loop.
  • When createInfoFinal.pool != VK_NULL_HANDLE, we are using a custom pool, and a custom pool is always created in one memory type explicitly specified when the pool was created. This is why vkMemReq.memoryTypeBits is unused then.

adam-sawicki-a avatar Jul 11 '24 11:07 adam-sawicki-a

Update: This issue still exists in VMA 3.2.1 when using NVIDIA GeForce RTX 3090.

VMA_VERSION_3_2_1_RTX3090_BUG.txt

IAmNotHanni avatar Mar 12 '25 02:03 IAmNotHanni

In TestDefragmentationAlgorithms() I see:

// ...
uint32_t memTypeIndex = UINT32_MAX;
vmaFindMemoryTypeIndexForBufferInfo(g_hAllocator, &bufCreateInfo, &allocCreateInfo, &memTypeIndex);

VmaPoolCreateInfo poolCreateInfo = {};
poolCreateInfo.blockSize = BLOCK_SIZE;
poolCreateInfo.memoryTypeIndex = memTypeIndex; // This is used for creating the buffers and the images...

VmaPool pool;
TEST(vmaCreatePool(g_hAllocator, &poolCreateInfo, &pool) == VK_SUCCESS);
allocCreateInfo.pool = pool;
// ...

There memTypeIndex is found by calling vmaFindMemoryTypeIndexForBufferInfo, but the pool we create is used for buffers and images. Shouldn't this call vmaFindMemoryTypeIndexForImageInfo as well? I assume we can't use 2 pools here because this is supposed to demonstrate defragmentation on one pool. Doesn't that mean the test would need to search for poolCreateInfo.memoryTypeIndex that both vmaFindMemoryTypeIndexForBufferInfo and vmaFindMemoryTypeIndexForImageInfo are ok with? I am confused.

IAmNotHanni avatar Apr 01 '25 18:04 IAmNotHanni

Thank you for reminding me about this bug. Hopefully it is fixed now.

adam-sawicki-a avatar Apr 09 '25 14:04 adam-sawicki-a

Thank you for reminding me about this bug. Hopefully it is fixed now.

Yes, this fixed it! (Tested on RTX 3090 and Intel Arc A770) The issue with AMD Ryzen™ 9 7950X (AMD Radeon(TM) Graphics) still exists: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator/issues/339

IAmNotHanni avatar Apr 09 '25 20:04 IAmNotHanni