beluga icon indicating copy to clipboard operation
beluga copied to clipboard

Research alternative to Bresenham's raytracing algorithm

Open glpuga opened this issue 2 years ago • 4 comments

Description

  • [ ] Evaluate the performance of Bresenham's algorithm for the Bean Sensor Model to determine if it's worth optimizing
  • [ ] Research this paper as a potential alternative

Definition of done

Either the current raytracing approach is not worth optimizing at this stage, or a decision is done that the alternative is worth implementing (in a separate issue).

Additional considerations

This ticket is not about implementing changes, it's about considering alternatives.

glpuga avatar May 01 '23 15:05 glpuga

Evaluate the performance of Bresenham's algorithm for the Bean Sensor Model to determine if it's worth optimizing

Note benchmarks are included in #160.

Integer arithmetic-based Bresenham algorithm benchmark
Run on (16 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.52, 1.47, 2.08
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_Bresenham2i/Standard/128        92.1 ns         92.1 ns      7475814
BM_Bresenham2i/Standard/256         181 ns          181 ns      3802972
BM_Bresenham2i/Standard/512         360 ns          360 ns      1933997
BM_Bresenham2i/Standard/1024        704 ns          704 ns       974018
BM_Bresenham2i/Standard/2048       1411 ns         1410 ns       495547
BM_Bresenham2i/Standard/4096       2800 ns         2799 ns       250246
BM_Bresenham2i/Standard_BigO       0.69 N          0.68 N    
BM_Bresenham2i/Standard_RMS           1 %             1 %    
BM_Bresenham2i/Modified/128         390 ns          390 ns      1794217
BM_Bresenham2i/Modified/256         764 ns          764 ns       900910
BM_Bresenham2i/Modified/512        1527 ns         1526 ns       457221
BM_Bresenham2i/Modified/1024       3054 ns         3054 ns       228170
BM_Bresenham2i/Modified/2048       6136 ns         6135 ns       113027
BM_Bresenham2i/Modified/4096      12276 ns        12275 ns        56818
BM_Bresenham2i/Modified_BigO       3.00 N          3.00 N    
BM_Bresenham2i/Modified_RMS           0 %             0 %
Ray casting benchmark
Run on (16 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.89, 1.00, 1.61
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
BM_RayCasting2d/128         175 ns          175 ns      3972583
BM_RayCasting2d/256         298 ns          298 ns      2362109
BM_RayCasting2d/512         589 ns          589 ns      1169115
BM_RayCasting2d/1024       1168 ns         1168 ns       589929
BM_RayCasting2d_BigO       1.15 N          1.15 N    
BM_RayCasting2d_RMS           3 %             3 %

These numbers don't look terrible, but when ray tracing 100 times for 2000 different states in a ~400 x ~400 grid, it adds up quickly to ~50 ms. I did some profiling and while we could shave off some of it by eliminating Eigen types, it'll probably take a different algorithm (approximate, multi-resolution, whatever) to get it in control for denser grids.

hidmic avatar May 03 '23 16:05 hidmic

It's great that we have the benchmarks already.

I think when deciding if it's worth to pursue further optimization we need to evaluate the execution time in the context of the total execution time per iteration.

A 5x improvement in Bresenham will be barely be noticeable if bresenham only accounts for 15% of the total iteration time, but it may be crucial if it represents 80% of the iteration time.

glpuga avatar May 04 '23 13:05 glpuga

FWIW I don't think we can improve the computation itself much further. We need to reduce the number of computations.

hidmic avatar May 04 '23 13:05 hidmic

@glpuga FYI, here are the flamegraphs from likelihood field and beam models for comparison. Downloading the files and re-opening them in Chrome should enable the zoom feature.

Main version: https://github.com/Ekumen-OS/beluga/commit/e72e1489a74d1302ddc9ddcd5b7660d26625ea92

likelihood_flamegraph beam_flamegraph

nahueespinosa avatar May 25 '23 14:05 nahueespinosa