FluidX3D icon indicating copy to clipboard operation
FluidX3D copied to clipboard

Multi-GPU Particle Tracking & Immersed Boundary

Open jasonxauat opened this issue 5 months ago • 25 comments

  • Why doesn’t FluidX3D support multi-GPU passive particle tracking? What’s the biggest technical roadblock here? (e.g., data transfer between GPUs? Syncing particle positions?)
  • Any ideas for implementing multi-GPU support? Would splitting the domain across GPUs work?

Also, I’m trying to implement an immersed boundary method by adapting the existing Particle code. It has seemed to work.

Thanks in advance!

Regards, Jason

jasonxauat avatar Jul 29 '25 03:07 jasonxauat

Hi @jasonxauat,

particles on multi-GPU require a lot of extra logic to determine which particles move between GPU domains and have to be copied over in between time steps. There is good solutions for this in literature - especially immersed-boundary codes for blood flow simulations, e.g. Ames et al., Kotsalos et al.. I was just too lazy to implement it.

Kind regards, Moritz

ProjectPhysX avatar Jul 30 '25 07:07 ProjectPhysX

Hi, Dr. Moritz, Based on your suggestion, I've developed a simpler modification plan to enable multi-GPU support for particles in FluidX3D. Here's a summary of the approach:

Proposed Multi-GPU Particle Support Solution

Instead of implementing complex particle migration logic between GPU domains, we propose a simpler approach where each GPU stores the complete particle list, but only computes interactions for particles within its domain.

Key Modifications:

  1. Full Particle Replication: Each GPU domain stores the complete particle list rather than partitioning particles across domains.

  2. Domain-aware Computation: Modify the kernel to check if each particle is within the current GPU's domain before computing forces:

    // Only compute forces for particles in current domain
    bool in_domain = (p.x >= min_x && p.x < max_x && 
                      p.y >= min_y && p.y < max_y && 
                      p.z >= min_z && p.z < max_z);
    
    if(in_domain) {
        // Compute forces and spread to fluid grid
    }
    
  3. Synchronization: Add a new communicate_particles() function to synchronize particle positions across all GPU domains after each time step.

  4. Transfer Kernels: Implement transfer_extract_particles and transfer_insert_particles kernels to handle full particle list transfers between domains.

  5. Memory Management: Modify memory allocation so each LBM_Domain allocates space for the complete particle set rather than a partitioned subset.

  6. Remove Limitation: Eliminate the check that prevents particles from running in multi-GPU mode.

Advantages:

  • Simpler implementation without complex particle migration logic
  • Ensures all GPUs have complete particle information
  • Minimal changes to existing code structure
  • Maintains correctness of particle-fluid interactions

Trade-offs:

  • Increased memory usage (D times more, where D = number of GPUs)
  • Communication overhead for synchronizing full particle lists
  • Computational redundancy (all GPUs compute all particle movements, but only apply forces for local particles)

Would this approach be acceptable for enabling multi-GPU particle support in FluidX3D? Are there any concerns with this strategy or suggestions for improvement?

jasonxauat avatar Jul 31 '25 08:07 jasonxauat

Hi @jasonxauat,

that approach sounds very reasonable. The memory overhead isn't even that bad - it could happen that all particles move from one domain into another, and then no memory reallocation is required at all, which is a big plus. Each domain therefore must have enough free VRAM to hold all particles, so can/should allocate the entire particle buffer in the first place.

I'll give the implementation a shot, have another week of vacation in 2 weeks.

Thanks a lot for sharing your ideas!

Kind regards, Moritz

ProjectPhysX avatar Aug 03 '25 15:08 ProjectPhysX

Thank you for your approval. With the help of qwen3-coder, I have tried to modify the code in recent days, which can be basically implemented. However, due to the existence of host and device domain array data reading and writing, the efficiency is still not very high, even with the offset read-and-write function(enqueue_read_from_device & enqueue_write_to_device). The difficulty here is that at least one array needs to be synchronized, whose value is used to mark the updated particles.

Is there any way to achieve single particle fast reading and writing? eg. only exchange data on GPUs? Could you see my code?

Image

src.zip

20250805_bugfixed.zip

jasonxauat avatar Aug 04 '25 01:08 jasonxauat

Hi Jason,

did you implemented immersed boundary for moving/rotation parts too? Is it better compare to revoxeling, or just a philosophy different?

regards Peter

pk-guest avatar Aug 08 '25 17:08 pk-guest

Hi Jason,

did you implemented immersed boundary for moving/rotation parts too? Is it better compare to revoxeling, or just a philosophy different?

regards Peter

Hi Peter,

I have tried implementing the immersed boundary method by treating particles as nodes. Since the particle functionality is inherently based on the immersed boundary approach, it involves calculating particle forces and updating their positions. When using the immersed boundary for walls, it's necessary to appropriately modify the kernel to control the particle positions. So far, I've only experimented with stationary nodes, but moving or rotating nodes should also be feasible with additional control mechanisms.

There have been numerous studies comparing IB with revoxeling (bounce-back), such as C. Shu's or my previous paper. Simply, IB is more suitable for handling surfaces and moving boundaries, but the accuracy requires additional correction. Here, fluidx3d, the classical explicit diffused IB method is used, and additional processing is required to meet the condition of no wall slip, eg.

Wu, J. and C. Shu, Implicit velocity correction-based immersed boundary-lattice Boltzmann method and its applications. Journal of Computational Physics, 2009. 228(6): p. 1963-1979. Kang, S.K. and Y.A. Hassan, A comparative study of direct-forcing immersed boundary-lattice Boltzmann methods for stationary complex boundaries. International Journal for Numerical Methods in Fluids, 2011. 66(9): p. 1132-1158.

Best regards, Jason

jasonxauat avatar Aug 13 '25 02:08 jasonxauat

Hello Jason,,

thx for your feedback. I tought you implemented the IBM for geometry motion. For particle is also interesting special if you want to model complex shapes. However DEM kernel is at the moment missing. But coupling with external DEM tool could be a good feature.

Peter

pk-guest avatar Aug 14 '25 15:08 pk-guest

Hi Peter,

I completely agree with the DEM coupling idea. Since the current setup lacks a built-in DEM kernel, linking with an external DEM tool makes perfect sense. However, it's crucial to find a suitable DEM tool based on OpenCL; otherwise, FluidX3D would lose its advantage of high computational efficiency.

Best regards, Jason

jasonxauat avatar Aug 15 '25 02:08 jasonxauat

To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D

pk-guest avatar Aug 15 '25 07:08 pk-guest

Hi Peter,

High-performance open-source GPU DEM tools are rare too—most are commercial, CUDA-based, and (as you said) have CPU communication lags. So finding OpenCL-based open-source DEM code is key to keeping FluidX3D’s efficiency; if not, we’ll need a custom DEM kernel. Thanks for the insights—they help refine our direction.

Best regards, Jason

jasonxauat avatar Aug 29 '25 03:08 jasonxauat

To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D

Hi Peter, @pk-guest

I've also been looking into open-source DEM tools that could potentially be coupled with FluidX3D. The main method is to use DEM software to calculate the motion of particles. Fluidx3D can obtain the hydraulic resistance experienced by particles, solve for particle collisions and motion in DEM, obtain new coordinates, and return them to Fluidx3D. Do you have any more efficient suggestions?

Here are some options:

  • LIGGGHTS
  • YADE
  • ...

If you have any other questions or need further information, please let me know.

Best regards, Jason

jasonxauat avatar Sep 16 '25 09:09 jasonxauat

Hi Jason,

I am not up to date in opensource DEM softwares, but most of them has no native GPU solver yet, or they have a lot of other limitation. Liggghts has no GPU solver yet. But I have to check the current situation to make a correct view.

Only one/two commertial DEM tools are good. EDEM and rocky. To make a coupling to EDEM is not so difficult, because there is a coupling interface, but than you have a hybrid approach, a mix of open source and commertial code. But advantage of full GPU solvers.

How much particle do you want to handle with the DEM solver? Coupling will communicate via CPU which will slow a little bit the solvers.

Peter

jasonxauat @.***> schrieb am Di., 16. Sept. 2025, 11:21:

jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3296979444

To couple it with external DEM tool is the easiest part. The tricky part is the solver from CFD and maybe the communication with huge number of particles, but until 10M should be fine. I dont know any good open source DEM tool with high performance GPU solver at the moment. Only commertial tools have usable GPU solvers. Most of them are CUDA based, but the communication is running via CPU which slow down a little bit. So its possible to couple two tools with maybe some disadvatage compare to one tool with internal DEM. But I dont know if somebody started to implement a DEM kernel for fluidX3D

Hi Peter, @pk-guest https://github.com/pk-guest

I've also been looking into open-source DEM tools that could potentially be coupled with FluidX3D. The main method is to use DEM software to calculate the motion of particles. Fluidx3D can obtain the hydraulic resistance experienced by particles, solve for particle collisions and motion in DEM, obtain new coordinates, and return them to Fluidx3D. Do you have any more efficient suggestions?

Here are some options:

  • LIGGGHTS
  • YADE
  • ...

If you have any other questions or need further information, please let me know.

Best regards, Jason

— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3296979444, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNAHIXBQ5YWPNSNEF3D3S7JCJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOJWHE3TSNBUGQ . You are receiving this because you were mentioned.Message ID: @.***>

pk-guest avatar Sep 17 '25 03:09 pk-guest

Hi Peter, It may be necessary to simulate less than 10 million particles for CFD coupling. It is best to compile and run it together with fluidx3d to reduce communication costs. I have gained some understanding of common DEM open source software, such as YADE and LIGGGHTS. It seems to have been run directly with commands, but I haven't found a suitable way to couple the code yet.

Best regards, Jason

jasonxauat avatar Sep 17 '25 07:09 jasonxauat

Hi,

I never used open source, but liggghts should have the capability for coupling. I can check it, give me a little bit time. 10M particle is ok, still not too much. Do you want simulate high particle density, so dense phase simulation, where the particles have also influence on air? Or just low particle density where only the air has influence on particles?

Regards Peter

jasonxauat @.***> schrieb am Mi., 17. Sept. 2025, 09:11:

jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3301613153

Hi Peter, It may be necessary to simulate less than 10 million particles for CFD coupling. It is best to compile and run it together with fluidx3d to reduce communication costs. I have gained some understanding of common DEM open source software, such as YADE and LIGGGHTS. It seems to have been run directly with commands, but I haven't found a suitable way to couple the code yet.

Best regards, Jason

— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3301613153, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNND5EEUEBKSJB6XC6D33TECRJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBRGYYTGMJVGM . You are receiving this because you were mentioned.Message ID: @.***>

pk-guest avatar Sep 17 '25 08:09 pk-guest

Hi Peter, I'd like to use the immersed boundary method, where the particles here represent Lagrangian points. Therefore, it doesn't require a large number of DEM points, but it does utilize the two-way coupling in FluidX3D. It would be ideal if the DEM supports the cluster feature to represent rigid body motion with multiple points. Regards, Jason

jasonxauat avatar Sep 17 '25 23:09 jasonxauat

Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.

Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.

pk-guest avatar Sep 18 '25 04:09 pk-guest

Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.

Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.

I am focusing the Option2: fluid has influence on particle and particle has influence on fluid too. The two-way coupling function of this part has already been implemented in the particles function of Fluidx3D, see "integrate_particles" function in kernel.cpp . The main task of DEM code is to solve the rigid body motion equation based on the feedback force of the flow field and other forces, and obtain the velocity and displacement of each Lagrangian point. Regards, Jason

jasonxauat avatar Sep 18 '25 08:09 jasonxauat

Hi,

Generaly the DEM code send the position and velocity to CFD and CFD send back force and torque in defined interval. I have to check how can fluidX3D handle variable/transient number of particels and not only a static particle definition at the begining. The second part get and send the information for the same particles. For Option1 I started an implementation but not for open source. Option2 will be the next step, but I have to check some additional topic. Integrated particle calculated the particle diameter with the half voxelsize, I read it somewhere. So I think its need also some additional work to take the the correct particle size. And be able to handle if particle is bigger like voxel.

jasonxauat @.***> schrieb am Do., 18. Sept. 2025, 10:28:

jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3306169121

Hi Jason, so you want that particles has influence on the fluid with IBM, right? I am asking because there are several understanding in the community. So which one do you want to be 100% sure that we talk about the same: Option1: fluid has influence on particle, but particle has no influence on fluid. General approach for low particle numbers Option2: fluid has influence on particle and particle has influence on fluid too.

Most of the DEM codes support multisphere particles, so clusters should be fine. Particle size distribution or different type of particles make it a little bit more complicate but possible to implement.

I am focusing the Option2: fluid has influence on particle and particle has influence on fluid too. The two-way coupling function of this part has already been implemented in the particles function of Fluidx3D, see "integrate_particles" function in kernel.cpp . The main task of DEM code is to solve the rigid body motion equation based on the feedback force of the flow field and other forces, and obtain the velocity and displacement of each Lagrangian point. Regards, Jason

— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3306169121, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNH6B42VCUYZS7ASQPD3TJUKJAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBWGE3DSMJSGE . You are receiving this because you were mentioned.Message ID: @.***>

pk-guest avatar Sep 19 '25 04:09 pk-guest

Hi,

here is a simple case for transient particle generation which is must have for later DEM coupling. However with high number of particle its slow down extremly. Maybe the initial position for all particle has to be rewrite.

For me ist still not clear: -how we get back the force from the particle(IDs)? -right now all particle has zero velocity however from DEM solver you get position, velocity, angvelocity orientation etc. Position and velocity is important, but now we define only the position. For force is relativ velocity is important.

Feel free to change update or modify to make it better this is a basic for two way coupling with external codes.

/**/void main_setup() { // particle test with transient particle generation; required extensions in defines.hpp: VOLUME_FORCE, FORCE_FIELD, MOVING_BOUNDARIES, PARTICLES, INTERACTIVE_GRAPHICS // ################################################################## define simulation box size, viscosity and volume force ################################################################### const float si_T = 3000000.0f; const ulong lbm_T = units.t(si_T); const ulong lbm_dt = 1ull; const uint L = 128u; const float Re = 1000.0f; const float u = 0.1f; uint my_k = 0u; uint my_couter = 0u; uint max_number_of_particles = 200000u; //LBM lbm(L, L, L, units.nu_from_Re(Re, (float)(L-2u), u), 0.0f, 0.0f, -0.00001f, cb(L/4u), 2.0f); LBM lbm(L, L, L, units.nu_from_Re(Re, (float)(L - 2u), u), 0.0f, 0.0f, -0.00001f, max_number_of_particles, 2.0f);//20000 particles puffer for the simulation // ###################################################################################### define geometry ###################################################################################### const uint Nx = lbm.get_Nx(), Ny = lbm.get_Ny(), Nz = lbm.get_Nz(); uint seed = 42u; for(ulong n=0ull; n<lbm.particles->length(); n++) //position all the particles in one point at the begining later we move them to correct position { lbm.particles->x[n] = Nx;// 0u;// random_symmetric(seed, 0.5f * lbm.size().x / 4.0f); lbm.particles->y[n] = Ny;//0u;//random_symmetric(seed, 0.5flbm.size().y/4.0f); lbm.particles->z[n] = Nz;//0u;//random_symmetric(seed, 0.5flbm.size().z/4.0f); //lbm.particles->x[n] = random_symmetric(seed, 2.5f * lbm.size().x); //lbm.particles->y[n] = random_symmetric(seed, 2.5flbm.size().y); //lbm.particles->z[n] = random_symmetric(seed, 2.5flbm.size().z); //lbm.particles->

}
//const uint Nx=lbm.get_Nx(), Ny=lbm.get_Ny(), Nz=lbm.get_Nz(); 
parallel_for(lbm.get_N(), [&](ulong n) { uint x=0u, y=0u, z=0u; lbm.coordinates(n, x, y, z);
	if(z==Nz-1) lbm.u.y[n] = u;
	if(x==0u||x==Nx-1u||y==0u||y==Ny-1u||z==0u||z==Nz-1u) lbm.flags[n] = TYPE_S; // all non periodic
}); // ####################################################################### run simulation, export images and data ##########################################################################
lbm.graphics.visualization_modes = VIS_FLAG_LATTICE | VIS_PARTICLES;// | VIS_STREAMLINES;
lbm.run(0u, lbm_T); // initialize simulation
while (lbm.get_t() <= lbm_T) // main simulation loop
{
	lbm.run(lbm_dt, lbm_T); // run dt time steps
	
	lbm.particles->read_from_device();
	if (lbm.get_t() == (1u + my_k))
	{
		uint seed = 42u;
		//for (ulong n = 0ull; n < lbm.particles->length(); n++)
		for (ulong n = 0ull+ my_couter; n < 200u + my_couter; n++)//generate only 200 new particles
		{
			lbm.particles->x[n] = random_symmetric(seed, 0.5f * lbm.size().x / 4.0f);
			lbm.particles->y[n] = random_symmetric(seed, 0.5f * lbm.size().y / 4.0f);
			lbm.particles->z[n] = random_symmetric(seed, 0.5f * lbm.size().z / 4.0f);
			//lbm.particles->
		}
		my_k += 1000u;//every N time
		my_couter += 200u; //increase the counter
	}
	
	lbm.particles->write_to_device();
	
}

} /**/

pk-guest avatar Sep 28 '25 15:09 pk-guest

Hi, I'm trying to add a new kernel array to pass back the forces on the particles. Since the particles currently use the immersed boundary method, the forces exerted by the fluid on the particles can be directly obtained.

jasonxauat avatar Sep 29 '25 02:09 jasonxauat

Hi,

initial particle velocity is also important. So with some additional function it is possible to use. However particle vs voxelsize is still not solved. Maybe define solid as particle could be also an alternativ approach. I have to make also some testto check which approach is the best with high number of particles

jasonxauat @.***> ezt írta (időpont: 2025. szept. 29., H, 4:15):

jasonxauat left a comment (ProjectPhysX/FluidX3D#306) https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3344617266

Hi, I'm trying to add a new kernel array to pass back the forces on the particles. Since the particles currently use the immersed boundary method, the forces exerted by the fluid on the particles can be directly obtained.

— Reply to this email directly, view it on GitHub https://github.com/ProjectPhysX/FluidX3D/issues/306#issuecomment-3344617266, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJ2YNNC3BKTRGZ6HN6XKJBD3VCI4XAVCNFSM6AAAAACCSTMOUCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBUGYYTOMRWGY . You are receiving this because you were mentioned.Message ID: @.***>

pk-guest avatar Sep 29 '25 05:09 pk-guest

Hi @jasonxauat,

I have finally done the multi-GPU particles implementation. Your general approach was very good already, but some changes were still needed and some more optimizations were possible:

  • The force spreading in your implementation is a bit troubling. My solution here is to do force spreading on all particles in the domain including the halo layers, but not for particles outside of the domain. I've verified simulation output vs. single-domain and this works as intended. If particles in halo layers were not included in force spreading stage, sedimentation along the domain boundary would be incorrectly be slowed down.
  • I found a very elegant solution to the problem that particles could have moved outside the domain during advection, which would complicate unifying particles from all domains into a single buffer again, as you don't know which domain did the modification. You solved it with an additional updated_domains global buffer. My solution: Write NaN into the x-coordinate of all particles that are not advected in the domain. On the host side, all particles with valid x-coordinate can then be gathered from their respective domain again, no data is lost, and no extra memory allocation/access is required.
  • My final optimization was that after the particle buffer has been unified in domain 0, I don't do another CPU copy to broadcast the particle buffers from domain 0 to the other domains. Instead, I copy domain 0 from CPU memory directly to all other domains in GPU memory, skipping the CPU-to-CPU copies.

Thanks a lot for your input, and have fun with update v3.5!

Kind regards, Moritz

ProjectPhysX avatar Oct 01 '25 16:10 ProjectPhysX

Hi Moritz,

Thanks so much for finalizing the multi-GPU particles implementation and sharing your key improvements—this is really valuable!

Your fixes are spot-on: adjusting force spreading to include halo layer particles (to avoid boundary sedimentation issues) and using NaNs for non-advected particles (instead of an extra buffer) are clever simplifications. Skipping redundant GPU-to-CPU copies when broadcasting the unified buffer also boosts efficiency nicely.

I’m eager to dive into v3.5 and learn from your code. I’ve been looking at expanding IBM-related features (e.g., more complex geometries, DEM, FSI) and wanted to ask if you’d be open to collaborating on that down the line?

Thanks again for your hard work!

Best regards,
Jason

Hi @jasonxauat,

I have finally done the multi-GPU particles implementation. Your general approach was very good already, but some changes were still needed and some more optimizations were possible:

* The force spreading in your implementation is a bit troubling. My solution here is to do force spreading on all particles in the domain [including the halo layers](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R2027-R2038), but not for particles outside of the domain. I've verified simulation output vs. single-domain and this works as intended. If particles in halo layers were not included in force spreading stage, sedimentation along the domain boundary would be incorrectly be slowed down.

* I found a very elegant solution to the problem that particles could have moved outside the domain during advection, which would complicate unifying particles from all domains into a single buffer again, as you don't know which domain did the modification. You solved it with an additional `updated_domains` global buffer. My solution: [Write `NaN` into the x-coordinate](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-464b1d19d4b616b9609031b48429081b2c215328d9f98bc5cbeac6b2b84fdbf3R2050) of all particles that are not advected in the domain. On the host side, [all particles with valid x-coordinate](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-b0ec52269255a923607e08208f1208efa9a69b0941a18fed4ed144851647ec79R1388) can then be gathered from their respective domain again, no data is lost, and no extra memory allocation/access is required.

* My final optimization was that after the particle buffer has been unified in domain 0, I don't do another CPU copy to broadcast the particle buffers from domain 0 to the other domains. Instead,[ I copy domain 0 from CPU memory directly to all other domains in GPU memory](https://github.com/ProjectPhysX/FluidX3D/commit/c1487e5b8895cf436700a9aa9f67651dba974cf3#diff-b0ec52269255a923607e08208f1208efa9a69b0941a18fed4ed144851647ec79R1397-R1401), skipping the GPU-to-CPU copies.

Thanks a lot for your input, and have fun with update v3.5!

Kind regards, Moritz

jasonxauat avatar Oct 09 '25 06:10 jasonxauat

Hello, I would like to ask if you have achieved the implementation of IB-LBM with complex dynamic boundaries, because I want to use such a scheme for the fluid dynamics simulation of flexible bodies

WangyanFR avatar Nov 07 '25 16:11 WangyanFR

Hello, I would like to ask if you have achieved the implementation of IB-LBM with complex dynamic boundaries, because I want to use such a scheme for the fluid dynamics simulation of flexible bodies

Hello! Thank you for your interest in this project. Currently, I'm developing the IB (Immersed Boundary) module by referencing the particle function framework, and the implementation of fixed IB-LBM is underway with ongoing testing. However, simulating flexible bodies requires integrating corresponding solid mechanics equations to capture deformation and dynamic responses, which is a key challenge for the current development. Do you have any feasible solutions or practical experience to share regarding the coupling of fluid dynamics (LBM) and solid mechanics for flexible bodies?

jasonxauat avatar Nov 12 '25 02:11 jasonxauat