Integrate involute surface into ORANGE runtime
Bring the implemented involute geometry from pull 210ce4422 into the runtime. Before pulling analysis of the different sections of the code should be looked at for compilation efficiency and checks to ensure that the involute code does not slow down other non-related sections when not called for.
Well, the GPU performance (even when not using involutes) seems to take a substantial hit (~40% longer for testem3 without involutes!) from this branch. It's easy to see why from looking at the occupancy change in the initialize-tracks kernel:
{
"const_mem": 0,
"heap_size": 8388608,
-"local_mem": 152,
-"max_blocks_per_cu": 5,
+"local_mem": 160,
+"max_blocks_per_cu": 2,
"max_threads_per_block": 256,
-"max_warps_per_eu": 40,
+"max_warps_per_eu": 16,
"name": "initialize-tracks",
-"num_regs": 48,
-"occupancy": 0.625,
+"num_regs": 92,
+"occupancy": 0.25,
"print_buffer_size": 5242880,
"stack_size": 1024,
"threads_per_block": 256
and along-step-neutral:
{
"const_mem": 0,
"heap_size": 8388608,
-"local_mem": 0,
+"local_mem": 288,
"max_blocks_per_cu": 2,
"max_threads_per_block": 256,
"max_warps_per_eu": 16,
"name": "along-step-neutral",
-"num_regs": 123,
+"num_regs": 128,
"occupancy": 0.25,
"print_buffer_size": 5242880,
"stack_size": 1024,
The local memory usage goes way up and the occupancy goes way down (where it can). For now I'll try disabling the runtime code paths and see if performance goes back to normal.
@VHLM2001 Great work this summer! As we discussed last Friday, it's disappointing but unsurprising that this ends up being a burden on the GPU. Next week (?) I can see if we can disable the code paths on GPU so that we can get this merged (and let someone else work on optimization).
I just ran before and after this branch; it seems the runtime "not reachable" is doing its job, but the CPU still suffers a runtime hit; not sure why since it should hit a "throw" into a cold section of the code. There should be no difference...
OK with the latest update all is well at runtime!