cuCollections icon indicating copy to clipboard operation
cuCollections copied to clipboard

[FEA] Add tests that verify vector loads are generated when expected

Open jrhemstad opened this issue 1 year ago • 0 comments

Generating vector load instructions is important for performance.

However, the compiler can be very fussy about actually generating them.

For example, https://godbolt.org/z/sfdaj695P shows prototyping code where all versions of nvcc prior to 11.7 would generate a ld.global.v4.u32 as expected, but in 11.7 it generates 4x ld.global.u32. What's worse is that it checks the condition after each load, so they won't be coalesced.

It would be nice to have a simple test that verifies vector loads (or stores) are generated in code where we expect them to be generated.

We should be sure to verify that vector SASS instructions are generated (and not just PTX) as I've seen cases where PTX will have vector loads but SASS does not.

This could be as simple as using cuobjdump -sass a.out | grep LDG.E.128.SYS, but there's probably more clever ways to do it.

jrhemstad avatar Aug 14 '22 14:08 jrhemstad