torchlpc icon indicating copy to clipboard operation
torchlpc copied to clipboard

feat: use thrust::inclusive_scan for recursion

Open yoyolicoris opened this issue 4 months ago • 0 comments

[------------------------------------- TorchLPC -------------------------------------]
                        |  v07, complex    |  thrust, complex |     v07    |  thrust
4 threads: ---------------------------------------------------------------------------
      [8, 16384, 1]     |        256.6     |       274.6      |     252.5  |    273.6 
      [8, 65536, 1]     |        290.1     |       290.7      |     271.7  |    275.1 
      [8, 262144, 1]    |        483.7     |       495.2      |     416.0  |    308.5 
      [32, 16384, 1]    |        282.9     |       287.9      |     263.0  |    274.5 
      [32, 65536, 1]    |        558.2     |       498.2      |     385.8  |    305.6 
      [32, 262144, 1]   |       1909.9     |      2007.3      |    1380.9  |   1018.4 
      [128, 16384, 1]   |        473.9     |       501.4      |     322.2  |    307.3 
      [128, 65536, 1]   |       1654.8     |      2004.9      |     852.7  |   1018.1 
      [128, 262144, 1]  |       7467.1     |      7928.6      |    4992.5  |   3941.5 

Times are in microseconds (us).
  • thrust performs slightly slower than v0.7 with complex numbers.
  • thrust can be slightly faster on long sequences and large batch sizes with real numbers.

yoyolicoris avatar Jul 23 '25 20:07 yoyolicoris