yaksa
yaksa copied to clipboard
Make fast-path hooks inline
Current backend defines all hooks as function pointer. Some hooks are accessed at fast-path or accessed multiple times in a single ipack/iunpack call. Compiler cannot optimize much for function pointers. We want to make these "fast" hooks inline.
@minsii The DMA access latencies for most GPUs are in microseconds. In comparison, a function pointer dereference is ~25 cycles. So perhaps an expected gain comparison is useful before doing this?
Thanks for suggestion @pavanbalaji . I will study more about the expected gain before changing the code.