SPHinXsys
SPHinXsys copied to clipboard
Making cache-friendly loops
https://github.com/Xiangyu-Hu/SPHinXsys/blob/1e0042b5040c6f59b9fa98e7634c33f63090fd1a/SPHINXsys/src/shared/particle_dynamics/solid_dynamics/solid_dynamics.cpp#L770-L778
While considering vectorization and see how it could fit, I ended up thinking specifically about this part of the code above. This kind of loop is suboptimal as it is doing one action on several different arrays. A data-oriented and cache-friendly approach would be applying the operation for each array one after the other:
for(auto index_i = 0; index_i < total_real_particles; ++index_i)
pos_n_[index_i] += vel_n_[index_i] * dt * 0.5;
for(auto i = 0; i < total_real_particles; ++i)
F_[index_i] += dF_dt_[index_i] * dt * 0.5;
for(auto i = 0; i < total_real_particles; ++i)
rho_n_[index_i] = rho0_ / det(F_[index_i]);
for(auto i = 0; i < total_real_particles; ++i)
stress_PK1_[index_i] = F_[index_i] * material_->ConstitutiveRelation(F_[index_i], index_i);
It does not seem so widespread overall in the codebase (as only one field is getting touched) but the architecture around ParticleDynamics1Level
in the library favors this kind of pattern.
It is good starting point. Need some test to see whether we get benefits. And how to combine with parallelization.