inplace_it
inplace_it copied to clipboard
Performance overhead of indirect() placement
Here is the latest Firefox profile running the stack of: JS WebGPU -> wgpu -> gfx-backend-vulkan -> inplace_it
https://share.firefox.dev/2RmndRr
What I found peculiar is that inplace_or_alloc_from_iter is only half the time of indirect

What else is indirect doing? Can we reduce this overhead?
Hi! indirect is necessary evil for disallow function inlining. See https://github.com/NotIntMan/inplace_it/issues/4 for problem description.
Provide more details about current problem. I don't fully understand what the problem is.
I assume you mean the high time consumption in this function. If so, then the problem is not in the function itself, but in the library code (remember that inlining this function is prohibited, but other code is not limited by this rule). In this case, a detailed research of the weak point will be required.
I haven't investigated this myself in depth yet. What I see is that a lot of time is spent between in indirect() itself, outside of the payload (the library code) I'm actually running. I.e. this code looks like this:
let sets_iter = sets.map(|set| set.raw);
inplace_or_alloc_from_iter(sets_iter, |sets| {
inplace_or_alloc_from_iter(offsets, |dynamic_offsets| unsafe {
self.device.raw.cmd_bind_descriptor_sets(
self.raw,
bind_point,
layout.raw,
first_set as u32,
&sets,
&dynamic_offsets,
);
});
});
}
What I expect is having the nested inplace_or_alloc_from_iter essentially free. Instead, it appears to cost as much as cmd_bind_descriptor_sets itself. I guess we could explain it by the fact there is some copying of data taking place, but it's still suspicious. Let's treat this issue as a call to try and inspect what exactly is going on there?