celeritas icon indicating copy to clipboard operation
celeritas copied to clipboard

Integrate involute surface into ORANGE runtime

Open VHLM2001 opened this issue 1 year ago • 1 comments

Bring the implemented involute geometry from pull 210ce4422 into the runtime. Before pulling analysis of the different sections of the code should be looked at for compilation efficiency and checks to ensure that the involute code does not slow down other non-related sections when not called for.

VHLM2001 avatar Jul 30 '24 20:07 VHLM2001

Well, the GPU performance (even when not using involutes) seems to take a substantial hit (~40% longer for testem3 without involutes!) from this branch. It's easy to see why from looking at the occupancy change in the initialize-tracks kernel:

 {
 "const_mem": 0,
 "heap_size": 8388608,
-"local_mem": 152,
-"max_blocks_per_cu": 5,
+"local_mem": 160,
+"max_blocks_per_cu": 2,
 "max_threads_per_block": 256,
-"max_warps_per_eu": 40,
+"max_warps_per_eu": 16,
 "name": "initialize-tracks",
-"num_regs": 48,
-"occupancy": 0.625,
+"num_regs": 92,
+"occupancy": 0.25,
 "print_buffer_size": 5242880,
 "stack_size": 1024,
 "threads_per_block": 256

and along-step-neutral:

 {
 "const_mem": 0,
 "heap_size": 8388608,
-"local_mem": 0,
+"local_mem": 288,
 "max_blocks_per_cu": 2,
 "max_threads_per_block": 256,
 "max_warps_per_eu": 16,
 "name": "along-step-neutral",
-"num_regs": 123,
+"num_regs": 128,
 "occupancy": 0.25,
 "print_buffer_size": 5242880,
 "stack_size": 1024,

The local memory usage goes way up and the occupancy goes way down (where it can). For now I'll try disabling the runtime code paths and see if performance goes back to normal.

sethrj avatar Aug 16 '24 17:08 sethrj

@VHLM2001 Great work this summer! As we discussed last Friday, it's disappointing but unsurprising that this ends up being a burden on the GPU. Next week (?) I can see if we can disable the code paths on GPU so that we can get this merged (and let someone else work on optimization).

sethrj avatar Aug 19 '24 18:08 sethrj

I just ran before and after this branch; it seems the runtime "not reachable" is doing its job, but the CPU still suffers a runtime hit; not sure why since it should hit a "throw" into a cold section of the code. There should be no difference... rel-throughput

sethrj avatar Aug 29 '24 18:08 sethrj

OK with the latest update all is well at runtime!

rel-throughput

sethrj avatar Sep 05 '24 13:09 sethrj