ParallelStencil.jl icon indicating copy to clipboard operation
ParallelStencil.jl copied to clipboard

[JuliaCon/proceedings-review] Performance metrics

Open georgebisbas opened this issue 2 years ago • 4 comments

Hi all,

q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?

q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?

georgebisbas avatar Nov 17 '23 11:11 georgebisbas

Thank you for the questions, @georgebisbas.

q1) what is the reason behind focusing on T_eff and not on Gpts/s as commonly used in papers reporting stencil performance?

The reason is that for T_eff we can define in a straightforward fashion a theoretical upper bound, which is simply T_peak, the peak memory throughput of the hardware used.

q2) Figure 2 shows that using the math-close notation, performance slightly drops compared to explicitly expressing the stencil computation. Where is this slowdown coming from?

The slowdown is coming from the generation of slightly more complex code, for example for avoiding out-of-bounds accesses.

omlins avatar Dec 05 '23 11:12 omlins

Thank you for your answers @omlins. Regarding q1, is it possible to also add gpts/s for the experiments executed? I think it would be a useful addition.

georgebisbas avatar Jan 15 '24 12:01 georgebisbas

@georgebisbas : thank you for your suggestion. We will try to accommodate it in the same plot.

omlins avatar Feb 01 '24 09:02 omlins

regarding q2, if one runs the code with deactivated bounds checking, should you regain performance?

svretina avatar Feb 16 '24 21:02 svretina