Totsu F32CUDA seems too slow

Benchmark, profile and optimize it to speed up.

https://github.com/convexbrain/Totsu/releases/tag/totsu_f32cuda_v0.1.0

Sep 27 '22 11:09 convexbrain

A benchmark result of LP

https://github.com/convexbrain/Totsu/tree/1f5200599ffd8bdf15e6ce672bcc1c2f0bbc11bb/experimental/benchmark_lp
F32CUDA is faster than FloatGeneric.

Benchmark of LP

Oct 02 '22 05:10 convexbrain

A benchmark result of QP

https://github.com/convexbrain/Totsu/tree/884e36b4fd32d696ddca046af755ad8a2d120a61/experimental/benchmark_qp
F32CUDA is slower than FloatGeneric. 😭

Benchmark of QP

Proceed to profiling using this benchmark.

Oct 05 '22 13:10 convexbrain

A profiling result of QP benchmark

FetJabgUcAIqKNY

Oct 14 '22 14:10 convexbrain

https://github.com/convexbrain/Totsu/tree/b56407463b691a3f2418510bc43e8a72d5186fc1/experimental/benchmark_qp

Jan 04 '23 16:01 convexbrain

Jan 04 '23 17:01 convexbrain

https://github.com/convexbrain/Totsu/tree/77f0e5cc10e7a2d29567352f88135a99ed620be1/experimental/benchmark_qp

Jan 05 '23 07:01 convexbrain

https://github.com/convexbrain/Totsu/tree/13b8d378f79445c53b9c9f77fbf4389029423d12/experimental/benchmark_qp

Jan 05 '23 17:01 convexbrain

Benchmark of QP (1)

The effect of CUDA comes out from about 800 variables.
The number of iterations is not monotonically increasing; probably because those QPs are generated with random numbers.
In the first place, the number of iterations is too large.

Jan 07 '23 05:01 convexbrain

Totsu Totsu copied to clipboard