Marco Barbone issues

Results 20 issues of


                                            Marco Barbone

Towards 2.3

I am creating this milestone to discuss what goes in 2.3 and what is left to do. This can be extended once the switching to template tasks are formalized. The...

python-cmake and python-buildwheel in CI should be merged into one

At the moment CI has two workflows for python which is redundant the next steps should be: - [ ] use generate_matrix.py to build/test wheels - [ ] use another...

Using something like: https://godbolt.org/z/GM94xb1j4 it will be possible to dispatch the entire finufft execute based on the plan selected removing branches for transform, data type, spreading width and dimensions.

FINUFFT_EXECUTE architecture dispatch

It is possible to compile the code in finufft_execute for multiple SIMD instructions and select the fastest available at runtime. This might not impact power users (that compile the code...

Type 3 on GPU

Type 3 NUFFT have the potential of being fast on GPU given the multiple FFTs required. CUFFT proved itself to be extremely fast on GPU.

constexpr everything

We should constexpr everything available in c++17 as this allows to do part of the computation at compile time when the data is needed. We should also add macros `CPP_20_CONSTEXPR`...

Towards 2.4

Possible future tasks for 2.4 Based on future discussions this can be deferred anytime. Grouping for simplicity: - [ ] De-Macroize #482 #483 - [ ] constexpr everything #487 -...

Prolate spheroidal wave function

@mreineck you mentioned in the past that by playing with the parameters of the spreading function it is possible to achieve more digits at the same width. Do you have...

enhancement

vectorized bin-sort singlethreaded

In my experiments I do not see much benefit, performance is more or less the same. Maybe worth trying with specific parameters that stress bin sort more.

phihat1, phihat2 and phihat3 are not correctly freed in some error cases resulting in memory leaks

This problem will once we de macroize and we use stl containers instead of malloc/free.