3 icon indicating copy to clipboard operation
3 copied to clipboard

Warn the user when choice of grid size can cause a CUDA error

Open JonathanMaes opened this issue 1 year ago • 0 comments

This PR adds warnings for two situations where the choice of grid size could result in a CUDA error.

  1. #284 When the number of cells along an axis has a prime factor >127, the CUDA_ERROR_INVALID_VALUE error occurs because of the inner workings of the cuFFT algorithm (see @jplauzie's reply).

    The new warning (example below) is already raised when the grid is not 7-smooth, i.e. when there is a prime factor greater than 7. This includes the >127 case, while also raising awareness about the recommendation to use a 7-smooth grid.

    // WARNING: y-axis is not 7-smooth. It has 501 cells, with prime
    //          factors [3 167], at least one of which is greater than 7.
    //          This may reduce performance or cause a CUDA_ERROR_INVALID_VALUE error.
    
  2. #314 When temperature is nonzero, and the grid contains an odd number of cells, the CURAND_STATUS_LENGTH_NOT_MULTIPLE error occurs. This is explained in the curandGenerateNormal documentation:

    Normally distributed results are generated from pseudorandom generators with a Box-Muller transform, and so require n to be even.

    The new warning (example below) is raised if the grid is odd, when the random thermal field is updated for the first time.

    // WARNING: nonzero temperature requires an even amount of grid cells,
    //          but all axes have an odd number of cells: [625 625 1].
    //          This may cause a CURAND_STATUS_LENGTH_NOT_MULTIPLE error.
    

These warnings are printed during program execution, so may be buried within the output. Alternatively, an error could be raised, but that seems premature if the CUDA error has not yet occurred. Alternatively, the warning could be printed at the very end of the output, but that seems hard to implement.

JonathanMaes avatar Oct 17 '24 12:10 JonathanMaes