inplace_it Performance overhead of indirect() placement

Here is the latest Firefox profile running the stack of: JS WebGPU -> wgpu -> gfx-backend-vulkan -> inplace_it

https://share.firefox.dev/2RmndRr

What I found peculiar is that inplace_or_alloc_from_iter is only half the time of indirect inplace-it-overhead

What else is indirect doing? Can we reduce this overhead?

Apr 09 '21 14:04 kvark

Hi! indirect is necessary evil for disallow function inlining. See https://github.com/NotIntMan/inplace_it/issues/4 for problem description.

Provide more details about current problem. I don't fully understand what the problem is.

Apr 09 '21 16:04 NotIntMan

I assume you mean the high time consumption in this function. If so, then the problem is not in the function itself, but in the library code (remember that inlining this function is prohibited, but other code is not limited by this rule). In this case, a detailed research of the weak point will be required.

Apr 09 '21 17:04 NotIntMan

I haven't investigated this myself in depth yet. What I see is that a lot of time is spent between in indirect() itself, outside of the payload (the library code) I'm actually running. I.e. this code looks like this:

        let sets_iter = sets.map(|set| set.raw);

        inplace_or_alloc_from_iter(sets_iter, |sets| {
            inplace_or_alloc_from_iter(offsets, |dynamic_offsets| unsafe {
                self.device.raw.cmd_bind_descriptor_sets(
                    self.raw,
                    bind_point,
                    layout.raw,
                    first_set as u32,
                    &sets,
                    &dynamic_offsets,
                );
            });
        });
    }

What I expect is having the nested inplace_or_alloc_from_iter essentially free. Instead, it appears to cost as much as cmd_bind_descriptor_sets itself. I guess we could explain it by the fact there is some copying of data taking place, but it's still suspicious. Let's treat this issue as a call to try and inspect what exactly is going on there?

Apr 09 '21 21:04 kvark

inplace_it inplace_it copied to clipboard

Performance overhead of indirect() placement

inplace_it
inplace_it copied to clipboard