Marco Barbone
Marco Barbone
I am creating this milestone to discuss what goes in 2.3 and what is left to do. This can be extended once the switching to template tasks are formalized. The...
At the moment CI has two workflows for python which is redundant the next steps should be: - [ ] use generate_matrix.py to build/test wheels - [ ] use another...
Using something like: https://godbolt.org/z/GM94xb1j4 it will be possible to dispatch the entire finufft execute based on the plan selected removing branches for transform, data type, spreading width and dimensions.
It is possible to compile the code in finufft_execute for multiple SIMD instructions and select the fastest available at runtime. This might not impact power users (that compile the code...
Type 3 NUFFT have the potential of being fast on GPU given the multiple FFTs required. CUFFT proved itself to be extremely fast on GPU.
We should constexpr everything available in c++17 as this allows to do part of the computation at compile time when the data is needed. We should also add macros `CPP_20_CONSTEXPR`...
Possible future tasks for 2.4 Based on future discussions this can be deferred anytime. Grouping for simplicity: - [ ] De-Macroize #482 #483 - [ ] constexpr everything #487 -...
@mreineck you mentioned in the past that by playing with the parameters of the spreading function it is possible to achieve more digits at the same width. Do you have...
In my experiments I do not see much benefit, performance is more or less the same. Maybe worth trying with specific parameters that stress bin sort more.
phihat1, phihat2 and phihat3 are not correctly freed in some error cases resulting in memory leaks
This problem will once we de macroize and we use stl containers instead of malloc/free.