Totsu icon indicating copy to clipboard operation
Totsu copied to clipboard

F32CUDA seems too slow

Open convexbrain opened this issue 3 years ago • 8 comments

Benchmark, profile and optimize it to speed up.

https://github.com/convexbrain/Totsu/releases/tag/totsu_f32cuda_v0.1.0

convexbrain avatar Sep 27 '22 11:09 convexbrain

A benchmark result of LP

  • https://github.com/convexbrain/Totsu/tree/1f5200599ffd8bdf15e6ce672bcc1c2f0bbc11bb/experimental/benchmark_lp
  • F32CUDA is faster than FloatGeneric.

Benchmark of LP

  • CPU
    • Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    • RAM: 32.0 GB
  • GPU
    • NVIDIA GeForce RTX 3070
    • CUDA core: 5888
    • Core clock: 1725 MHz
    • Memory bandwidth: 448.06 GB/s
    • Memory: 8192 MB GDDR6

convexbrain avatar Oct 02 '22 05:10 convexbrain

A benchmark result of QP

  • https://github.com/convexbrain/Totsu/tree/884e36b4fd32d696ddca046af755ad8a2d120a61/experimental/benchmark_qp
  • F32CUDA is slower than FloatGeneric. 😭

Benchmark of QP

Proceed to profiling using this benchmark.

convexbrain avatar Oct 05 '22 13:10 convexbrain

A profiling result of QP benchmark

  • Many memory accesses are occurring when projecting onto the cone.

FetJabgUcAIqKNY

convexbrain avatar Oct 14 '22 14:10 convexbrain

https://github.com/convexbrain/Totsu/tree/b56407463b691a3f2418510bc43e8a72d5186fc1/experimental/benchmark_qp

  • CUDA-izing projection onto cones as much as possible.
  • 200 vars (100 primals, 100 duals).

a

convexbrain avatar Jan 04 '23 16:01 convexbrain

  • 400 vars (200 primals, 200 duals).

a

convexbrain avatar Jan 04 '23 17:01 convexbrain

https://github.com/convexbrain/Totsu/tree/77f0e5cc10e7a2d29567352f88135a99ed620be1/experimental/benchmark_qp

  • FxHashMap instead of HashMap.
  • 200 vars (100 primals, 100 duals).

a

convexbrain avatar Jan 05 '23 07:01 convexbrain

https://github.com/convexbrain/Totsu/tree/13b8d378f79445c53b9c9f77fbf4389029423d12/experimental/benchmark_qp

  • Intermittent criteria checks.
  • 200 vars (100 primals, 100 duals).

a

convexbrain avatar Jan 05 '23 17:01 convexbrain

Benchmark of QP (1)

  • The effect of CUDA comes out from about 800 variables.
  • The number of iterations is not monotonically increasing; probably because those QPs are generated with random numbers.
  • In the first place, the number of iterations is too large.

convexbrain avatar Jan 07 '23 05:01 convexbrain