Cabana icon indicating copy to clipboard operation
Cabana copied to clipboard

AoSoA Boolean Data Corruption on GPU with Large Particle Counts

Open Sichao25 opened this issue 5 months ago • 2 comments

Hello, I am learning to use AoSoA as the data structure for particle analysis. When using bool in the member types, some data becomes corrupted when the particle number exceeds a certain threshold.

In the following reproducible script, I create the AoSoA and initialize it, then access it to change some particle information. I use a bool as the mask in the member type. When using over 80 million particles, some masks are changed from true to false. Changing bool to integer resolves the error. Additionally, I did not observe the same issue when using CPU with the same input size, though I cannot guarantee it is safe with larger inputs.

I wonder if there are any restrictions on using Boolean types on GPU? Or am I making obvious mistakes in the code? Any suggestions or thoughts are appreciated.

The code:

#include <Cabana_Core.hpp>
#include <Kokkos_Core.hpp>
#include <iostream>

using DataTypes = Cabana::MemberTypes<double[3],  // position
                                        double[3],   // computed position
                                        int,          // id
                                        float,
                                        float,
                                        bool>;     // mask

using MemorySpace = Kokkos::DefaultExecutionSpace::memory_space;
using AoSoA_t = Cabana::AoSoA<DataTypes, MemorySpace>;

void count(AoSoA_t& aosoa) {
  Kokkos::View<int*, MemorySpace> count("count", 1);
  auto mask = Cabana::slice<5>(aosoa, "mask");
  Kokkos::parallel_for("count_active", aosoa.size(), KOKKOS_LAMBDA(const int i) {
      if (mask(i)) {
          Kokkos::atomic_fetch_add(&count(0), 1);
      }
  });
  Kokkos::View<int*, Kokkos::HostSpace> active_count("active_count", 1);
  Kokkos::deep_copy(active_count, count);
  std::cout << "Number of active particles: " << active_count(0) << std::endl;
}

int main(int argc, char* argv[])
{
  Kokkos::initialize(argc, argv);

  {
    const int size = std::atoi(argv[1]);

    AoSoA_t aosoa("ParticleData", size);
    auto positions = Cabana::slice<0>(aosoa, "position");
    auto computed = Cabana::slice<1>(aosoa, "computed");
    auto pid = Cabana::slice<2>(aosoa, "id");
    auto ellipse_b = Cabana::slice<3>(aosoa, "ellipse_b");
    auto angle = Cabana::slice<4>(aosoa, "angle");
    auto mask = Cabana::slice<5>(aosoa, "mask");

    std::cout << "Initializing AoSoA with " << size << " elements..." << std::endl;
    
    // Initialize elements using parallel_for
    Kokkos::parallel_for("initialize",
      Kokkos::RangePolicy<>(0, size),
      KOKKOS_LAMBDA(const int i) {
        positions(i, 0) = 1.0;      // x
        positions(i, 1) = 2.0;      // y
        positions(i, 2) = 3.0;      // z
        computed(i, 0) = 1.0;  // vx
        computed(i, 1) = 2.0;  // vy
        computed(i, 2) = 3.0;  // vz
        pid(i) = i;            // id
        ellipse_b(i) = 0.5;   // ellipse b
        angle(i) = 0.25;      // angle
        mask(i) = true; // all active
      }
    );

    count(aosoa);
    std::cout << "\nModifying elements" << std::endl;
    const auto soa_len = AoSoA_t::vector_length;
    std::cout << "AoSoA vector length: " << soa_len << std::endl;
    Cabana::SimdPolicy<soa_len,Kokkos::DefaultExecutionSpace> simd_policy(0, size);
    Cabana::simd_parallel_for(simd_policy, KOKKOS_LAMBDA( const int soa, const int ptcl ) {
      if (mask.access(soa, ptcl)) {
        positions.access(soa, ptcl, 0) = 1.0;
        positions.access(soa, ptcl, 1) = 2.0;
        positions.access(soa, ptcl, 2) = 3.0;
      }
    });
    count(aosoa);
    std::cout << "capacity " << aosoa.capacity() << " size " << aosoa.size() << " numSoA " << aosoa.numSoA() << std::endl;
  }

  // Finalize Kokkos
  Kokkos::finalize();

  return 0;
}

The output with 80 million particles:

Initializing AoSoA with 80000000 elements...
Number of active particles: 80000000

Modifying elements
AoSoA vector length: 32
Number of active particles: 72806984
capacity 80000000 size 80000000 numSoA 2500000

Test with:

gcc/12.3.0
mpich/4.1.1
cmake/3.26.3
cuda/12.1.1

Sichao25 avatar Sep 05 '25 23:09 Sichao25

Hi @streeve. Do you think we should switch to int or do you expect bool to work?

cwsmith avatar Sep 12 '25 19:09 cwsmith

Sorry for the delay here - busy time of year here. I just pinned #579 since it's come up a few times, but at best it's a similar type of issue with padding - everything already seems to be in order. @sslattery have you seen any issues with bool specifically?

streeve avatar Sep 19 '25 20:09 streeve