Vulkan.jl icon indicating copy to clipboard operation
Vulkan.jl copied to clipboard

Presence of garbage data when retrieving physical device memory types

Open exaexa opened this issue 2 years ago • 9 comments

Hi, I'm getting what seems to be an overflowing buffer reads when trying to enumerate the available physical device memory types.

Reduced to the simplest reproducer I know the problem looks like this:

using Vulkan
println.(get_physical_device_memory_properties(unwrap(enumerate_physical_devices(Instance([],[])))[1]).memory_types);

The command lists a few valid memory types, but these are followed by garbage that looks like random data picked from uninitialized memory, such as:

MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0x00000001)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT), 0x00000000)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT), 0x00000001)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT), 0x00000001)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT), 0x00000002)
MemoryType(MemoryPropertyFlag(), 0x00000004)             #this is the first invalid one, there's only 3 memory hea entries
MemoryType(MemoryPropertyFlag(), 0x00004000)
MemoryType(MemoryPropertyFlag(), 0x0000001f)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT), 0x0000001f)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT), 0x00000008)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT), 0x0000001f)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT), 0x0000001f)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT), 0x00000001)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT), 0x42a6aaab)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_HOST_CACHED_BIT), 0x00000008)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_HOST_CACHED_BIT), 0x00000002)
MemoryType(MemoryPropertyFlag(), 0x437fe000)
MemoryType(MemoryPropertyFlag(), 0x40ffc000)
MemoryType(MemoryPropertyFlag(), 0x7d9d8d00)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_RESERVED_8_BIT_NV), 0x00000001)
MemoryType(MemoryPropertyFlag(), 0xffffff80)
MemoryType(MemoryPropertyFlag(MEMORY_PROPERTY_DEVICE_LOCAL_BIT | MEMORY_PROPERTY_HOST_VISIBLE_BIT | MEMORY_PROPERTY_HOST_COHERENT_BIT | MEMORY_PROPERTY_HOST_CACHED_BIT | MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT | MEMORY_PROPERTY_PROTECTED_BIT | MEMORY_PROPERTY_DEVICE_COHERENT_BIT_AMD | MEMORY_PROPERTY_DEVICE_UNCACHED_BIT_AMD | MEMORY_PROPERTY_RESERVED_8_BIT_NV), 0x00000002)
MemoryType(MemoryPropertyFlag(), 0x0223f5f0)
MemoryType(MemoryPropertyFlag(), 0x00000000)
MemoryType(MemoryPropertyFlag(), 0x00000002)
MemoryType(MemoryPropertyFlag(), 0x03a16470)

The beginning of the listing checks with the vulkaninfo, the rest seems to be garbage.

I'll try to debug what's wrong and report back. Is there any guess on whether this is a possible "premature GC" issue?

Thanks! -mk


Relevant vulkaninfo section:

VkPhysicalDeviceMemoryProperties:
=================================
memoryHeaps: count = 3
	memoryHeaps[0]:
		size   = 4294967296 (0x100000000) (4.00 GiB)
		budget = 4099145728 (0xf4540000) (3.82 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT
	memoryHeaps[1]:
		size   = 25050470400 (0x5d51fd800) (23.33 GiB)
		budget = 25050470400 (0x5d51fd800) (23.33 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags:
			None
	memoryHeaps[2]:
		size   = 257949696 (0x0f600000) (246.00 MiB)
		budget = 254935040 (0x0f320000) (243.12 MiB)
		usage  = 3014656 (0x002e0000) (2.88 MiB)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT
memoryTypes: count = 11
	memoryTypes[0]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				None
			IMAGE_TILING_LINEAR:
				color images
				(non-sparse, non-transient)
	memoryTypes[1]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				color images
				(non-sparse)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[2]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				FORMAT_D16_UNORM
				(non-sparse, non-transient)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[3]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				FORMAT_X8_D24_UNORM_PACK32
				FORMAT_D24_UNORM_S8_UINT
				(non-sparse, non-transient)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[4]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				FORMAT_D32_SFLOAT
				(non-sparse, non-transient)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[5]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				FORMAT_D32_SFLOAT_S8_UINT
				(non-sparse, non-transient)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[6]:
		heapIndex     = 1
		propertyFlags = 0x0000:
			None
		usable for:
			IMAGE_TILING_OPTIMAL:
				FORMAT_S8_UINT
				(non-sparse, non-transient)
			IMAGE_TILING_LINEAR:
				None
	memoryTypes[7]:
		heapIndex     = 0
		propertyFlags = 0x0001: count = 1
			MEMORY_PROPERTY_DEVICE_LOCAL_BIT
		usable for:
			IMAGE_TILING_OPTIMAL:
				color images
				FORMAT_D16_UNORM
				FORMAT_X8_D24_UNORM_PACK32
				FORMAT_D32_SFLOAT
				FORMAT_S8_UINT
				FORMAT_D24_UNORM_S8_UINT
				FORMAT_D32_SFLOAT_S8_UINT
			IMAGE_TILING_LINEAR:
				color images
				(non-sparse, non-transient)
	memoryTypes[8]:
		heapIndex     = 1
		propertyFlags = 0x0006: count = 2
			MEMORY_PROPERTY_HOST_VISIBLE_BIT
			MEMORY_PROPERTY_HOST_COHERENT_BIT
		usable for:
			IMAGE_TILING_OPTIMAL:
				None
			IMAGE_TILING_LINEAR:
				color images
				(non-sparse, non-transient)
	memoryTypes[9]:
		heapIndex     = 1
		propertyFlags = 0x000e: count = 3
			MEMORY_PROPERTY_HOST_VISIBLE_BIT
			MEMORY_PROPERTY_HOST_COHERENT_BIT
			MEMORY_PROPERTY_HOST_CACHED_BIT
		usable for:
			IMAGE_TILING_OPTIMAL:
				None
			IMAGE_TILING_LINEAR:
				color images
				(non-sparse, non-transient)
	memoryTypes[10]:
		heapIndex     = 2
		propertyFlags = 0x0007: count = 3
			MEMORY_PROPERTY_DEVICE_LOCAL_BIT
			MEMORY_PROPERTY_HOST_VISIBLE_BIT
			MEMORY_PROPERTY_HOST_COHERENT_BIT
		usable for:
			IMAGE_TILING_OPTIMAL:
				None
			IMAGE_TILING_LINEAR:
				color images
				(non-sparse, non-transient)

exaexa avatar Mar 28 '22 11:03 exaexa

Hi, thanks for raising the issue.

The correct usage is to restrict the memory_types field according to the memory_type_count field of the PhysicalDeviceMemoryProperties. That is because internally the specification uses a tuple of fixed size into which it writes elements, along with count values (both memory_type_count and memory_heap_count for heaps). This checks out with the output from vulkaninfo.

While the workaround is clear, it would make for less confusion to just special-case this struct/API call to have variable-size Vectors for heaps and memory types that are indexed with the right count from the start (such that you can rely on the command you pasted without having to check for the counts manually).

serenity4 avatar Mar 28 '22 13:03 serenity4

hi! thanks for quick reply. I kindof figured I should simply not read more items than what's specified in memory_type/heap_count, otoh it is certainly a bit surprising that there's actual vector with uninitialized memory getting returned. Perhaps a small warning in documentation would suffice.

(Btw how is this different from other array-returning functions (such as the queries for queue families) that seem to behave just right?)

exaexa avatar Mar 28 '22 13:03 exaexa

(Btw how is this different from other array-returning functions (such as the queries for queue families) that seem to behave just right?)

The difference is in the API. vkGetPhysicalDeviceMemoryProperties does not allow you to query a count, initialize the array, and query for data; it just retrieves the data in one go. I don't know why they adopted this particular design, but it's confusing for sure.

serenity4 avatar Apr 02 '22 18:04 serenity4

As of 1.3.207, it seems that the issue with bounded arrays containing garbage concerns the following structures:

  • VkDeviceGroupPresentCapabilitiesKHR
  • VkPhysicalDeviceGroupProperties
  • VkQueueFamilyGlobalPriorityPropertiesKHR
  • VkPhysicalDeviceMemoryBudgetPropertiesEXT
  • VkPhysicalDeviceMemoryProperties

Ideally, instead of having docs about this (since there are lots of things described in the docs already) we should replace these fixed-size tuples with Vectors that contain only valid elements.

serenity4 avatar Apr 02 '22 20:04 serenity4

Hi, thanks for the detailed explanation, actually when compared to vulkan API this makes much more sense now. I guess the methods in question aren't particularly performance-center-ish so a bit of extra processing for making the vectors look nice won't be a problem right? (Perhaps returning a correct "view" into the vector should be sufficient?)

exaexa avatar Apr 03 '22 11:04 exaexa

That's what I think as well, as these structures look like they'll be used mostly in application setup/initialization. Returning a view assumes that the original vector exists somewhere, so I think it would be easier to remove the original fields (count + fixed-size tuple) to replace that with a vector. Though there are several possibilities, we should probably take whichever is easiest on the generator code.

serenity4 avatar Apr 03 '22 11:04 serenity4

:thinking: looks like the structures aren't static, at least SwiftShader is copying the data out: https://github.com/google/swiftshader/blob/master/src/Vulkan/libVulkan.cpp#L672-L677

exaexa avatar Apr 03 '22 11:04 exaexa

I guess that depends what the vk::PhysicalDevice::GetMemoryProperties() does, pMemoryProperties is intended to be a pointer filled with new data. Or maybe I didn't understand your comment?

serenity4 avatar Apr 03 '22 21:04 serenity4

Ah yes I thought that the data would be laying somewhere indefinitely and we could just safely pick a view for them. Unfortunately not the case :sweat_smile:

exaexa avatar Apr 05 '22 07:04 exaexa