cuda_benchmark icon indicating copy to clipboard operation
cuda_benchmark copied to clipboard

Small error in instructions.cu example (mul_op)

Open col-mcc opened this issue 1 year ago • 0 comments

mul_op (int) in the instructions.cu example is actually doing an addition!

I'm new to cuda, but I presume the 'add.s32' should be 'mul.lo.s32'.

The output in the readme looks to be reflecting this error too.

I tested out the impact of making this change on a Tesla T4 and it went from -

int add 1.89 3 87.044762 3200 (3276800) ... int mul 1.89 3 87.348724 3200 (3276800) float mul 3.14 5 62.641941 3200 (3276800)

to -

int mul 3.14 5 62.652721 3200 (3276800) float mul 3.14 5 62.641941 3200 (3276800)

(so int and float mul taking roughly equal amounts of time.)

col-mcc avatar Oct 23 '24 22:10 col-mcc