CFLAG
CFLAG
Please update the changelog before merging
There are plenty of unresolved comments, I think there is no need for a new review right now
I will measure the GPU performance for transpose operations and real-to-complex FFT on Jean-Zay cluster, using V100, H100 and A100.
@pbartholomew08 do you think this PR is an issue for the UDALES users ?
Back to draft, to allow major update of the halo first
Closing : the PR has significant conflicts and is using the previous halo interface
Please rebase and switch to a regular PR when review is needed
I think there is an issue with the formatting in the log file : ``` In auto-tuning mode...... factors: 1 2 4 8 p_row x p_col 2 4 L2 and...
Is the 1D array distributed ? Currently, it is possible to read / write 1D arrays using MPI-IO with `decomp_2d_write_scalar` and `decomp_2d_read_scalar`. See https://github.com/2decomp-fft/2decomp-fft/blob/dev/src/io_mpi.f90#L82 and https://github.com/2decomp-fft/2decomp-fft/blob/dev/examples/io_mpi/io_var_test.f90