VulkanMemoryAllocator
VulkanMemoryAllocator copied to clipboard
Problem when running tests of sample application on NVIDIA RTX 3090
Hi
As mentioned in the update here, I am stuck in a loop when running the tests of the sample application on the NVidia RTX 3090. I created this issue separately, as I believe it has nothing to do with the issue on AMD Ryzen™ 9 7950X.
TESTING:
Test JSON
Saving JSON dump to file "JSON_VULKAN.json"
Test basics
Test vnaGetAllocatorInfo
Test virtual blocks
Test virtual blocks algorithms
Benchmark virtual blocks algorithms
Alignment,Algorithm,Strategy,Alloc time ms,Random operation time ms,Free time ms
1,TLSF,Default,0.8683,1.9644,0.6968
1,Linear,Default,0.4501,1.9759,1.0393
1,TLSF,MIN_MEMORY,0.5542,1.8697,0.6747
1,Linear,MIN_MEMORY,0.4277,1.9545,1.0324
1,TLSF,MIN_TIME,0.4838,1.7939,0.7068
1,Linear,MIN_TIME,0.5023,1.9511,1.0267
16,TLSF,Default,1.475,2.3691,0.7475
16,Linear,Default,0.44,1.9559,1.0447
16,TLSF,MIN_MEMORY,1.8215,3.5448,0.7413
16,Linear,MIN_MEMORY,0.4307,1.9208,1.0501
16,TLSF,MIN_TIME,0.7536,2.287,0.7677
16,Linear,MIN_TIME,0.4334,1.9472,1.0469
64,TLSF,Default,1.553,3.9169,0.7531
64,Linear,Default,0.4387,1.9565,1.0517
64,TLSF,MIN_MEMORY,1.8469,4.536,0.8009
64,Linear,MIN_MEMORY,0.433,1.951,1.0555
64,TLSF,MIN_TIME,0.7629,2.2192,0.7585
64,Linear,MIN_TIME,0.4442,1.9544,1.0469
256,TLSF,Default,1.8158,4.5182,0.7826
256,Linear,Default,0.4409,1.9617,1.0433
256,TLSF,MIN_MEMORY,1.771,4.434,0.7337
256,Linear,MIN_MEMORY,0.4375,1.9494,1.0511
256,TLSF,MIN_TIME,0.7781,2.1551,0.7544
256,Linear,MIN_TIME,0.4368,1.9581,1.0577
Test allocation versus resource size
Test Pool MinBlockCount
Test Pool MinAllocationAlignment
Test pools and allocation parameters
Test heap size limit
Testing memory usage:
VMA_MEMORY_USAGE_UNKNOWN:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=0
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=0
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=0
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=0
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=0
VMA_MEMORY_USAGE_GPU_ONLY:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=1
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=1
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=1
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=1
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=1
VMA_MEMORY_USAGE_CPU_ONLY:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=3
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=3
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
VMA_MEMORY_USAGE_CPU_TO_GPU:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=5
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=5
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
VMA_MEMORY_USAGE_GPU_TO_CPU:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=4
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=4
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
VMA_MEMORY_USAGE_CPU_COPY:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, memoryTypeIndex=0
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, memoryTypeIndex=0
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, memoryTypeIndex=0
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, memoryTypeIndex=0
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, memoryTypeIndex=0
VMA_MEMORY_USAGE_GPU_LAZILY_ALLOCATED:
Buffer TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3B, FAILED with res=-8
Buffer TRANSFER_DST + VERTEX_BUFFER: memoryTypeBits=0x3B, FAILED with res=-8
Image OPTIMAL TRANSFER_DST + TRANSFER_SRC: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL TRANSFER_DST + SAMPLED: memoryTypeBits=0x3, FAILED with res=-8
Image OPTIMAL SAMPLED + COLOR_ATTACHMENT: memoryTypeBits=0x3, FAILED with res=-8
Testing statistics...
Testing aliasing...
size: max(1399808, 8847360) = 8847360
alignment: max(1024, 1024) = 1024
memoryTypeBits: 3 & 3 = 3
Testing allocation aliasing...
Testing mapping...
Testing allocation-memory copy...
Test mapping hysteresis
Test VK_KHR_maintenance5
Testing mapping multithreaded...
Test linear allocator
Manually test linear allocator
Test linear allocator multi block
Test allocation algorithm correctness
Basic test TLSF
Basic test allocate pages
Test buffer device address
Test memory priority
Benchmark algorithms
Algorithm=TLSF Empty Allocation=MIN_MEMORY FreeOrder=BACKWARD: allocations 0.0304009 s, free 0.0090137 s
Algorithm=TLSF Empty Allocation=MIN_TIME FreeOrder=BACKWARD: allocations 0.0139352 s, free 0.0089325 s
Algorithm=Linear Empty Allocation=Default FreeOrder=BACKWARD: allocations 0.011355 s, free 0.0084895 s
Algorithm=TLSF Not empty Allocation=MIN_MEMORY FreeOrder=BACKWARD: allocations 0.0486588 s, free 0.0090518 s
Algorithm=TLSF Not empty Allocation=MIN_TIME FreeOrder=BACKWARD: allocations 0.0158069 s, free 0.0106145 s
Algorithm=Linear Not empty Allocation=Default FreeOrder=BACKWARD: allocations 0.0113662 s, free 0.0086673 s
Algorithm=TLSF Empty Allocation=MIN_MEMORY FreeOrder=FORWARD: allocations 0.0297251 s, free 0.0101225 s
Algorithm=TLSF Empty Allocation=MIN_TIME FreeOrder=FORWARD: allocations 0.0139524 s, free 0.0099312 s
Algorithm=Linear Empty Allocation=Default FreeOrder=FORWARD: allocations 0.011273 s, free 0.008372 s
Algorithm=TLSF Not empty Allocation=MIN_MEMORY FreeOrder=FORWARD: allocations 0.0494525 s, free 0.0103591 s
Algorithm=TLSF Not empty Allocation=MIN_TIME FreeOrder=FORWARD: allocations 0.0158748 s, free 0.010693 s
Algorithm=Linear Not empty Allocation=Default FreeOrder=FORWARD: allocations 0.0113684 s, free 0.0108735 s
Test defragmentation simple
Persistently mapped option = 0
Persistently mapped option = 1
Test defragmentation vs mapping
Pass 0 moving 31 allocations
Pass 1 moving 6 allocations
Defragmentation: moved 31 allocations, 2031616 B, freed 5 memory blocks, 5242880 B
Test defragmentation simple
Algorithm = Fast
VUID-vkBindImageMemory-memory-01047 ║ Validation Error: [ VUID-vkBindImageMemory-memory-01047 ] Object 0: handle =
0xec3f770000002066, type = VK_OBJECT_TYPE_DEVICE_MEMORY; | MessageID = 0xa316549f | vkBindImageMemory(): image require
memoryTypeBits (0x3) but VkDeviceMemory 0xec3f770000002066[] was allocated with memoryTypeIndex (4). The Vulkan spec states:
memory must have been allocated using one of the memory types allowed in the memoryTypeBits member of the
VkMemoryRequirements structure returned from a call to vkGetImageMemoryRequirements with image (https://vulkan.lunarg.com/doc/
view/1.3.275.0/windows/1.3-extensions/vkspec.html#VUID-vkBindImageMemory-memory-01047)
I attempted to do a git bisect, but I could not identify when the problem was introduced. If anyone has an idea where to start the bisect let me know.
The validation error comes from AllocInfo::CreateImage when it's being called by TestDefragmentationAlgorithms.
It looks like the issue is deep in vmaCreateImage itself...
Inside of VkResult VmaAllocator_T::AllocateMemory, the vkMemReq.memoryTypeBits is not passed on into AllocateMemoryOfType?
if(createInfoFinal.pool != VK_NULL_HANDLE)
{
VmaBlockVector& blockVector = createInfoFinal.pool->m_BlockVector;
return AllocateMemoryOfType(
createInfoFinal.pool,
vkMemReq.size,
vkMemReq.alignment,
prefersDedicatedAllocation,
dedicatedBuffer,
dedicatedImage,
dedicatedBufferImageUsage,
createInfoFinal,
blockVector.GetMemoryTypeIndex(),
suballocType,
createInfoFinal.pool->m_DedicatedAllocations,
blockVector,
allocationCount,
pAllocations);
}
Inside function VmaAllocator_T::AllocateMemory:
- In case when
createInfoFinal.pool == VK_NULL_HANDLE, which means default pools are used,vkMemReq.memoryTypeBitsis used to find the preferred memory type in a loop. - When
createInfoFinal.pool != VK_NULL_HANDLE, we are using a custom pool, and a custom pool is always created in one memory type explicitly specified when the pool was created. This is whyvkMemReq.memoryTypeBitsis unused then.
Update: This issue still exists in VMA 3.2.1 when using NVIDIA GeForce RTX 3090.
In TestDefragmentationAlgorithms() I see:
// ...
uint32_t memTypeIndex = UINT32_MAX;
vmaFindMemoryTypeIndexForBufferInfo(g_hAllocator, &bufCreateInfo, &allocCreateInfo, &memTypeIndex);
VmaPoolCreateInfo poolCreateInfo = {};
poolCreateInfo.blockSize = BLOCK_SIZE;
poolCreateInfo.memoryTypeIndex = memTypeIndex; // This is used for creating the buffers and the images...
VmaPool pool;
TEST(vmaCreatePool(g_hAllocator, &poolCreateInfo, &pool) == VK_SUCCESS);
allocCreateInfo.pool = pool;
// ...
There memTypeIndex is found by calling vmaFindMemoryTypeIndexForBufferInfo, but the pool we create is used for buffers and images. Shouldn't this call vmaFindMemoryTypeIndexForImageInfo as well? I assume we can't use 2 pools here because this is supposed to demonstrate defragmentation on one pool. Doesn't that mean the test would need to search for poolCreateInfo.memoryTypeIndex that both vmaFindMemoryTypeIndexForBufferInfo and vmaFindMemoryTypeIndexForImageInfo are ok with? I am confused.
Thank you for reminding me about this bug. Hopefully it is fixed now.
Thank you for reminding me about this bug. Hopefully it is fixed now.
Yes, this fixed it! (Tested on RTX 3090 and Intel Arc A770) The issue with AMD Ryzen™ 9 7950X (AMD Radeon(TM) Graphics) still exists: https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator/issues/339