CFLAG

Results 39 comments of CFLAG

Please update the changelog before merging

There are plenty of unresolved comments, I think there is no need for a new review right now

I will measure the GPU performance for transpose operations and real-to-complex FFT on Jean-Zay cluster, using V100, H100 and A100.

@pbartholomew08 do you think this PR is an issue for the UDALES users ?

Back to draft, to allow major update of the halo first

Closing : the PR has significant conflicts and is using the previous halo interface

Please rebase and switch to a regular PR when review is needed

I think there is an issue with the formatting in the log file : ``` In auto-tuning mode...... factors: 1 2 4 8 p_row x p_col 2 4 L2 and...

Is the 1D array distributed ? Currently, it is possible to read / write 1D arrays using MPI-IO with `decomp_2d_write_scalar` and `decomp_2d_read_scalar`. See https://github.com/2decomp-fft/2decomp-fft/blob/dev/src/io_mpi.f90#L82 and https://github.com/2decomp-fft/2decomp-fft/blob/dev/examples/io_mpi/io_var_test.f90