vulkano icon indicating copy to clipboard operation
vulkano copied to clipboard

`CpuBufferPool` slower than glium

Open KeyboardDanni opened this issue 5 years ago • 3 comments

  • Version of vulkano: 0.19.0
  • OS: Arch Linux
  • GPU (the selected PhysicalDevice): GeForce GTX 1060 6GB
  • GPU Driver: nVidia 455.38 (Driver version 0x71c98000, Vulkan version 1.2.142)
  • Upload of a reasonably minimal complete main.rs file that demonstrates the issue: Run Keeshond Doggymark example in latest git, reference code for the Vulkan renderer is here: https://gitlab.com/cosmicchipsocket/keeshond/-/blob/c91fbb2a011be18cb462a8173725730ce2052ceb/keeshond/src/renderer/vulkan.rs#L272 Sorry I can't provide anything more concise, but it was hard enough setting up vulkano for this project as-is.

Issue

When using vulkano to draw instanced quads, the overhead for each draw is actually larger than glium, defeating the whole point of using vulkano in the first place.

I need low draw call overhead for my 2D sprite-based engine for scenarios where ordered draws involve lots of texture changes, as these can't be batched.

For each draw operation I do the following:

let chunk = self.instance_buffer.chunk(self.instance_buffer_src.clone()).expect("Failed to allocate buffer chunk");
let vertex_slice = self.vertex_buffer.clone();

builder.draw_indexed(self.pipeline.clone(), &self.dynamic_state,
                     vec![vertex_slice, Arc::new(chunk)],
                     self.index_buffer.clone(), (), ()).expect("Failed to draw buffer");

The problem is that 1. chunk() seems to be performing a lot of allocations, or is otherwise taking a long time to figure out which chunk to use and whether to allocate, and 2. I have to create a new Arc for every call to draw_indexed(). If I remove batching, callgrind reports 23.97% time spent in vulkano::buffer::cpu_pool::CpuBufferPool<T,A>::try_next_impl and 49.89% time spent in __memcpy_avx_unaligned_erms which is being called from core::ptr::drop_in_place'2 which seems to be coming from dropping the Arc. Without this overhead I suspect vulkano would be quite fast, but right now it's blocking me from working on the rest of this renderer until this bottleneck is resolved.

I tried to do buffering myself using a Vec of CpuAccessibleBuffer objects, but I ran into #1429 and #1433 while trying to implement this.

Any help on this is greatly appreciated.

KeyboardDanni avatar Nov 07 '20 19:11 KeyboardDanni

if you want draw 2d sprite try use blit image or copy image and 1 quad plane

KentaTheBugMaker avatar Nov 12 '20 15:11 KentaTheBugMaker

if you want draw 2d sprite try use blit image or copy image and 1 quad plane

I need to be able to apply transforms, blending, and shaders, so this is a no-go. Additionally, I would have to issue a separate command for every single draw operation, which I don't think would scale well to ~400k sprites. I need to be able to fill an 8k instance buffer so that I can give each sprite a different transform, alpha, etc. within the shader and still have everything be fast, and draw all that at once.

KeyboardDanni avatar Nov 16 '20 02:11 KeyboardDanni

I recently noticed when profiling my code that try_next_impl is taking up an awful lot of time. That seems to be connected to this issue.

Rua avatar Jan 23 '21 10:01 Rua