Multi-GPU Particle Tracking & Immersed Boundary
- Why doesn’t FluidX3D support multi-GPU passive particle tracking? What’s the biggest technical roadblock here? (e.g., data transfer between GPUs? Syncing particle positions?)
- Any ideas for implementing multi-GPU support? Would splitting the domain across GPUs work?
Also, I’m trying to implement an immersed boundary method by adapting the existing Particle code. It has seemed to work.
Thanks in advance!
Regards, Jason
Hi @jasonxauat,
particles on multi-GPU require a lot of extra logic to determine which particles move between GPU domains and have to be copied over in between time steps. There is good solutions for this in literature - especially immersed-boundary codes for blood flow simulations, e.g. Ames et al., Kotsalos et al.. I was just too lazy to implement it.
Kind regards, Moritz
Hi, Dr. Moritz, Based on your suggestion, I've developed a simpler modification plan to enable multi-GPU support for particles in FluidX3D. Here's a summary of the approach:
Proposed Multi-GPU Particle Support Solution
Instead of implementing complex particle migration logic between GPU domains, we propose a simpler approach where each GPU stores the complete particle list, but only computes interactions for particles within its domain.
Key Modifications:
-
Full Particle Replication: Each GPU domain stores the complete particle list rather than partitioning particles across domains.
-
Domain-aware Computation: Modify the kernel to check if each particle is within the current GPU's domain before computing forces:
// Only compute forces for particles in current domain bool in_domain = (p.x >= min_x && p.x < max_x && p.y >= min_y && p.y < max_y && p.z >= min_z && p.z < max_z); if(in_domain) { // Compute forces and spread to fluid grid } -
Synchronization: Add a new
communicate_particles()function to synchronize particle positions across all GPU domains after each time step. -
Transfer Kernels: Implement
transfer_extract_particlesandtransfer_insert_particleskernels to handle full particle list transfers between domains. -
Memory Management: Modify memory allocation so each LBM_Domain allocates space for the complete particle set rather than a partitioned subset.
-
Remove Limitation: Eliminate the check that prevents particles from running in multi-GPU mode.
Advantages:
- Simpler implementation without complex particle migration logic
- Ensures all GPUs have complete particle information
- Minimal changes to existing code structure
- Maintains correctness of particle-fluid interactions
Trade-offs:
- Increased memory usage (D times more, where D = number of GPUs)
- Communication overhead for synchronizing full particle lists
- Computational redundancy (all GPUs compute all particle movements, but only apply forces for local particles)
Would this approach be acceptable for enabling multi-GPU particle support in FluidX3D? Are there any concerns with this strategy or suggestions for improvement?
Hi @jasonxauat,
that approach sounds very reasonable. The memory overhead isn't even that bad - it could happen that all particles move from one domain into another, and then no memory reallocation is required at all, which is a big plus. Each domain therefore must have enough free VRAM to hold all particles, so can/should allocate the entire particle buffer in the first place.
I'll give the implementation a shot, have another week of vacation in 2 weeks.
Thanks a lot for sharing your ideas!
Kind regards, Moritz
Thank you for your approval. With the help of qwen3-coder, I have tried to modify the code in recent days, which can be basically implemented. However, due to the existence of host and device domain array data reading and writing, the efficiency is still not very high, even with the offset read-and-write function(enqueue_read_from_device & enqueue_write_to_device). The difficulty here is that at least one array needs to be synchronized, whose value is used to mark the updated particles.
Is there any way to achieve single particle fast reading and writing? eg. only exchange data on GPUs? Could you see my code?
Hi Jason,
did you implemented immersed boundary for moving/rotation parts too? Is it better compare to revoxeling, or just a philosophy different?
regards Peter
Hi Jason,
did you implemented immersed boundary for moving/rotation parts too? Is it better compare to revoxeling, or just a philosophy different?
regards Peter
Hi Peter,
I have tried implementing the immersed boundary method by treating particles as nodes. Since the particle functionality is inherently based on the immersed boundary approach, it involves calculating particle forces and updating their positions. When using the immersed boundary for walls, it's necessary to appropriately modify the kernel to control the particle positions. So far, I've only experimented with stationary nodes, but moving or rotating nodes should also be feasible with additional control mechanisms.
There have been numerous studies comparing IB with revoxeling (bounce-back), such as C. Shu's or my previous paper. Simply, IB is more suitable for handling surfaces and moving boundaries, but the accuracy requires additional correction. Here, fluidx3d, the classical explicit diffused IB method is used, and additional processing is required to meet the condition of no wall slip, eg.
Wu, J. and C. Shu, Implicit velocity correction-based immersed boundary-lattice Boltzmann method and its applications. Journal of Computational Physics, 2009. 228(6): p. 1963-1979. Kang, S.K. and Y.A. Hassan, A comparative study of direct-forcing immersed boundary-lattice Boltzmann methods for stationary complex boundaries. International Journal for Numerical Methods in Fluids, 2011. 66(9): p. 1132-1158.
Best regards, Jason
Hello Jason,,
thx for your feedback. I tought you implemented the IBM for geometry motion. For particle is also interesting special if you want to model complex shapes. However DEM kernel is at the moment missing. But coupling with external DEM tool could be a good feature.
Peter
Hi Peter,
I completely agree with the DEM coupling idea. Since the current setup lacks a built-in DEM kernel, linking with an external DEM tool makes perfect sense. However, it's crucial to find a suitable DEM tool based on OpenCL; otherwise, FluidX3D would lose its advantage of high computational efficiency.
Best regards, Jason
To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D
Hi Peter,
High-performance open-source GPU DEM tools are rare too—most are commercial, CUDA-based, and (as you said) have CPU communication lags. So finding OpenCL-based open-source DEM code is key to keeping FluidX3D’s efficiency; if not, we’ll need a custom DEM kernel. Thanks for the insights—they help refine our direction.
Best regards, Jason
To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D
Hi Peter, @pk-guest
I've also been looking into open-source DEM tools that could potentially be coupled with FluidX3D. The main method is to use DEM software to calculate the motion of particles. Fluidx3D can obtain the hydraulic resistance experienced by particles, solve for particle collisions and motion in DEM, obtain new coordinates, and return them to Fluidx3D. Do you have any more efficient suggestions?
Here are some options:
- LIGGGHTS
- YADE
- ...
If you have any other questions or need further information, please let me know.
Best regards, Jason
Hi Jason,
I am not up to date in opensource DEM softwares, but most of them has no native GPU solver yet, or they have a lot of other limitation. Liggghts has no GPU solver yet. But I have to check the current situation to make a correct view.
Only one/two commertial DEM tools are good. EDEM and rocky. To make a coupling to EDEM is not so difficult, because there is a coupling interface, but than you have a hybrid approach, a mix of open source and commertial code. But advantage of full GPU solvers.
How much particle do you want to handle with the DEM solver? Coupling will communicate via CPU which will slow a little bit the solvers.
Peter
jasonxauat @.***> schrieb am Di., 16. Sept. 2025, 11:21:
jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3296979444
To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D
Hi Peter, @pk-guest https://github.com/pk-guest
I've also been looking into open-source DEM tools that could potentially be coupled with FluidX3D. The main method is to use DEM software to calculate the motion of particles. Fluidx3D can obtain the hydraulic resistance experienced by particles, solve for particle collisions and motion in DEM, obtain new coordinates, and return them to Fluidx3D. Do you have any more efficient suggestions?
Here are some options:
- LIGGGHTS
- YADE
- ...
If you have any other questions or need further information, please let me know.
Best regards, Jason
— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3296979444, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNAHIXBQ5YWPNSNEF3D3S7JCJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOJWHE3TSNBUGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi Peter, It may be necessary to simulate less than 10 million particles for CFD coupling. It is best to compile and run it together with fluidx3d to reduce communication costs. I have gained some understanding of common DEM open source software, such as YADE and LIGGGHTS. It seems to have been run directly with commands, but I haven't found a suitable way to couple the code yet.
Best regards, Jason
Hi,
I never used open source, but liggghts should have the capability for coupling. I can check it, give me a little bit time. 10M particle is ok, still not too much. Do you want simulate high particle density, so dense phase simulation, where the particles have also influence on air? Or just low particle density where only the air has influence on particles?
Regards Peter
jasonxauat @.***> schrieb am Mi., 17. Sept. 2025, 09:11:
jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3301613153
Hi Peter, It may be necessary to simulate less than 10 million particles for CFD coupling. It is best to compile and run it together with fluidx3d to reduce communication costs. I have gained some understanding of common DEM open source software, such as YADE and LIGGGHTS. It seems to have been run directly with commands, but I haven't found a suitable way to couple the code yet.
Best regards, Jason
— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3301613153, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNND5EEUEBKSJB6XC6D33TECRJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBRGYYTGMJVGM . You are receiving this because you were mentioned.Message ID: @.***>
Hi Peter, I'd like to use the immersed boundary method, where the particles here represent Lagrangian points. Therefore, it doesn't require a large number of DEM points, but it does utilize the two-way coupling in FluidX3D. It would be ideal if the DEM supports the cluster feature to represent rigid body motion with multiple points. Regards, Jason
Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.
Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.
Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.
Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.
I am focusing the Option2: fluid has influence on particle and particle has influence on fluid too. The two-way coupling function of this part has already been implemented in the particles function of Fluidx3D, see "integrate_particles" function in kernel.cpp . The main task of DEM code is to solve the rigid body motion equation based on the feedback force of the flow field and other forces, and obtain the velocity and displacement of each Lagrangian point. Regards, Jason
Hi,
Generaly the DEM code send the position and velocity to CFD and CFD send back force and torque in defined interval. I have to check how can fluidX3D handle variable/transient number of particels and not only a static particle definition at the begining. The second part get and send the information for the same particles. For Option1 I started an implementation but not for open source. Option2 will be the next step, but I have to check some additional topic. Integrated particle calculated the particle diameter with the half voxelsize, I read it somewhere. So I think its need also some additional work to take the the correct particle size. And be able to handle if particle is bigger like voxel.
jasonxauat @.***> schrieb am Do., 18. Sept. 2025, 10:28:
jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3306169121
Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.
Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.
I am focusing the Option2: fluid has influence on particle and particle has influence on fluid too. The two-way coupling function of this part has already been implemented in the particles function of Fluidx3D, see "integrate_particles" function in kernel.cpp . The main task of DEM code is to solve the rigid body motion equation based on the feedback force of the flow field and other forces, and obtain the velocity and displacement of each Lagrangian point. Regards, Jason
— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3306169121, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNH6B42VCUYZS7ASQPD3TJUKJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBWGE3DSMJSGE . You are receiving this because you were mentioned.Message ID: @.***>
Hi,
here is a simple case for transient particle generation which is must have for later DEM coupling. However with high number of particle its slow down extremly. Maybe the initial position for all particle has to be rewrite.
For me ist still not clear: -how we get back the force from the particle(IDs)? -right now all particle has zero velocity however from DEM solver you get position, velocity, angvelocity orientation etc. Position and velocity is important, but now we define only the position. For force is relativ velocity is important.
Feel free to change update or modify to make it better this is a basic for two way coupling with external codes.
/**/void main_setup() { // particle test with transient particle generation; required extensions in defines.hpp: VOLUME_FORCE, FORCE_FIELD, MOVING_BOUNDARIES, PARTICLES, INTERACTIVE_GRAPHICS // ################################################################## define simulation box size, viscosity and volume force ################################################################### const float si_T = 3000000.0f; const ulong lbm_T = units.t(si_T); const ulong lbm_dt = 1ull; const uint L = 128u; const float Re = 1000.0f; const float u = 0.1f; uint my_k = 0u; uint my_couter = 0u; uint max_number_of_particles = 200000u; //LBM lbm(L, L, L, units.nu_from_Re(Re, (float)(L-2u), u), 0.0f, 0.0f, -0.00001f, cb(L/4u), 2.0f); LBM lbm(L, L, L, units.nu_from_Re(Re, (float)(L - 2u), u), 0.0f, 0.0f, -0.00001f, max_number_of_particles, 2.0f);//20000 particles puffer for the simulation // ###################################################################################### define geometry ###################################################################################### const uint Nx = lbm.get_Nx(), Ny = lbm.get_Ny(), Nz = lbm.get_Nz(); uint seed = 42u; for(ulong n=0ull; n<lbm.particles->length(); n++) //position all the particles in one point at the begining later we move them to correct position { lbm.particles->x[n] = Nx;// 0u;// random_symmetric(seed, 0.5f * lbm.size().x / 4.0f); lbm.particles->y[n] = Ny;//0u;//random_symmetric(seed, 0.5flbm.size().y/4.0f); lbm.particles->z[n] = Nz;//0u;//random_symmetric(seed, 0.5flbm.size().z/4.0f); //lbm.particles->x[n] = random_symmetric(seed, 2.5f * lbm.size().x); //lbm.particles->y[n] = random_symmetric(seed, 2.5flbm.size().y); //lbm.particles->z[n] = random_symmetric(seed, 2.5flbm.size().z); //lbm.particles->
}
//const uint Nx=lbm.get_Nx(), Ny=lbm.get_Ny(), Nz=lbm.get_Nz();
parallel_for(lbm.get_N(), [&](ulong n) { uint x=0u, y=0u, z=0u; lbm.coordinates(n, x, y, z);
if(z==Nz-1) lbm.u.y[n] = u;
if(x==0u||x==Nx-1u||y==0u||y==Ny-1u||z==0u||z==Nz-1u) lbm.flags[n] = TYPE_S; // all non periodic
}); // ####################################################################### run simulation, export images and data ##########################################################################
lbm.graphics.visualization_modes = VIS_FLAG_LATTICE | VIS_PARTICLES;// | VIS_STREAMLINES;
lbm.run(0u, lbm_T); // initialize simulation
while (lbm.get_t() <= lbm_T) // main simulation loop
{
lbm.run(lbm_dt, lbm_T); // run dt time steps
lbm.particles->read_from_device();
if (lbm.get_t() == (1u + my_k))
{
uint seed = 42u;
//for (ulong n = 0ull; n < lbm.particles->length(); n++)
for (ulong n = 0ull+ my_couter; n < 200u + my_couter; n++)//generate only 200 new particles
{
lbm.particles->x[n] = random_symmetric(seed, 0.5f * lbm.size().x / 4.0f);
lbm.particles->y[n] = random_symmetric(seed, 0.5f * lbm.size().y / 4.0f);
lbm.particles->z[n] = random_symmetric(seed, 0.5f * lbm.size().z / 4.0f);
//lbm.particles->
}
my_k += 1000u;//every N time
my_couter += 200u; //increase the counter
}
lbm.particles->write_to_device();
}
} /**/
Hi, I'm trying to add a new kernel array to pass back the forces on the particles. Since the particles currently use the immersed boundary method, the forces exerted by the fluid on the particles can be directly obtained.
Hi,
initial particle velocity is also important. So with some additional function it is possible to use. However particle vs voxelsize is still not solved. Maybe define solid as particle could be also an alternativ approach. I have to make also some testto check which approach is the best with high number of particles
jasonxauat @.***> ezt írta (időpont: 2025. szept. 29., H, 4:15):
jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3344617266
Hi, I'm trying to add a new kernel array to pass back the forces on the particles. Since the particles currently use the immersed boundary method, the forces exerted by the fluid on the particles can be directly obtained.
— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3344617266, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNC3BKTRGZ6HN6XKJBD3VCI4XAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBUGYYTOMRWGY . You are receiving this because you were mentioned.Message ID: @.***>
Hi @jasonxauat,
I have finally done the multi-GPU particles implementation. Your general approach was very good already, but some changes were still needed and some more optimizations were possible:
- The force spreading in your implementation is a bit troubling. My solution here is to do force spreading on all particles in the domain including the halo layers, but not for particles outside of the domain. I've verified simulation output vs. single-domain and this works as intended. If particles in halo layers were not included in force spreading stage, sedimentation along the domain boundary would be incorrectly be slowed down.
- I found a very elegant solution to the problem that particles could have moved outside the domain during advection, which would complicate unifying particles from all domains into a single buffer again, as you don't know which domain did the modification. You solved it with an additional
updated_domainsglobal buffer. My solution: WriteNaNinto the x-coordinate of all particles that are not advected in the domain. On the host side, all particles with valid x-coordinate can then be gathered from their respective domain again, no data is lost, and no extra memory allocation/access is required. - My final optimization was that after the particle buffer has been unified in domain 0, I don't do another CPU copy to broadcast the particle buffers from domain 0 to the other domains. Instead, I copy domain 0 from CPU memory directly to all other domains in GPU memory, skipping the CPU-to-CPU copies.
Thanks a lot for your input, and have fun with update v3.5!
Kind regards, Moritz
Hi Moritz,
Thanks so much for finalizing the multi-GPU particles implementation and sharing your key improvements—this is really valuable!
Your fixes are spot-on: adjusting force spreading to include halo layer particles (to avoid boundary sedimentation issues) and using NaNs for non-advected particles (instead of an extra buffer) are clever simplifications. Skipping redundant GPU-to-CPU copies when broadcasting the unified buffer also boosts efficiency nicely.
I’m eager to dive into v3.5 and learn from your code. I’ve been looking at expanding IBM-related features (e.g., more complex geometries, DEM, FSI) and wanted to ask if you’d be open to collaborating on that down the line?
Thanks again for your hard work!
Best regards,
Jason
Hi @jasonxauat,
I have finally done the multi-GPU particles implementation. Your general approach was very good already, but some changes were still needed and some more optimizations were possible:
* The force spreading in your implementation is a bit troubling. My solution here is to do force spreading on all particles in the domain [including the halo layers](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R2027-R2038), but not for particles outside of the domain. I've verified simulation output vs. single-domain and this works as intended. If particles in halo layers were not included in force spreading stage, sedimentation along the domain boundary would be incorrectly be slowed down. * I found a very elegant solution to the problem that particles could have moved outside the domain during advection, which would complicate unifying particles from all domains into a single buffer again, as you don't know which domain did the modification. You solved it with an additional `updated_domains` global buffer. My solution: [Write `NaN` into the x-coordinate](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R2050) of all particles that are not advected in the domain. On the host side, [all particles with valid x-coordinate](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-b0ec52269255a923607e08208f1208efa9a69b0941a18fed4ed144851647ec79R1388) can then be gathered from their respective domain again, no data is lost, and no extra memory allocation/access is required. * My final optimization was that after the particle buffer has been unified in domain 0, I don't do another CPU copy to broadcast the particle buffers from domain 0 to the other domains. Instead,[ I copy domain 0 from CPU memory directly to all other domains in GPU memory](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-b0ec52269255a923607e08208f1208efa9a69b0941a18fed4ed144851647ec79R1397-R1401), skipping the GPU-to-CPU copies.Thanks a lot for your input, and have fun with update v3.5!
Kind regards, Moritz
Hello, I would like to ask if you have achieved the implementation of IB-LBM with complex dynamic boundaries, because I want to use such a scheme for the fluid dynamics simulation of flexible bodies
Hello, I would like to ask if you have achieved the implementation of IB-LBM with complex dynamic boundaries, because I want to use such a scheme for the fluid dynamics simulation of flexible bodies
Hello! Thank you for your interest in this project. Currently, I'm developing the IB (Immersed Boundary) module by referencing the particle function framework, and the implementation of fixed IB-LBM is underway with ongoing testing. However, simulating flexible bodies requires integrating corresponding solid mechanics equations to capture deformation and dynamic responses, which is a key challenge for the current development. Do you have any feasible solutions or practical experience to share regarding the coupling of fluid dynamics (LBM) and solid mechanics for flexible bodies?