game-of-leif
game-of-leif copied to clipboard
open repeating borders
I like to add a repeating border option to all my CAs. It makes them more dynamic and really gives them a chance to unfold their patterns naturally. Without it, there is always interference from particles hovering at the border that don't react properly to the rules. Of course, it adds overhead and performance takes a hit. That's why i always make it optional. Basically, i browse through random simulations using closed borders until i see something interesting, then i toggle repeating borders to get a better picture of what it 'really' looks like.
Implementation is fairly easy:
set_particle_position(pos): if pos.x is greater than width=> pos.x-= width / is smaller than 0=> pos.x+= width. Same with height.
getting the distance between 2 particles:
get_particle_distance(pos1, pos2, out virtual_pos) : get distance of screen coordinates (same as before) THEN check if you can find a shorter distance between particle 1 on screen and any potential particle 2 off-screen/virtual position. Like so
distance(pos1, pos2 + Vector2(width, 0)) distance(pos1, pos2 + Vector2(-width, 0)) distance(pos1, pos2 + Vector2(height, 0)) distance(pos1, pos2 + Vector2(-height, 0)) distance(pos1, pos2 + Vector2(width, height)) ... for all 8 neighbouring quadrants (Un-optimized approach)
the function returns the shortest distance found and the virtual position of the second particle for use in any following vector operations
This is a good idea. "Strict" bounds creates very artificial angular disturbance to the patterns, and no bounds just causes a lot of particles to be launched off into infinity. This is worth the try, and I agree that it should be optional as it could introduce significant overhead and changes in behavior.
Some of this can be reduced by checking if the bounds are within the radius of the particle, that shouldn't be too expensive. In any case, std::sqrt
currently accounts for >50% of the cycles so I don't expect the actual checks you suggest to take up much CPU time, but the additional interactions might. Then again, the particles clumped up around the bounds are "needlessly" interacting with each other constantly so having this approach can lead to them being more spread out across the world again, which in turn can cause less interactions to calculate, so it might even out somewhat.
Thank you for your suggestions, they're much appreciated!
After some attempts and considerations it's clear that repeating bounds has too much of a performance impact. The single distance check currently being done is already very heavy:
const float r = std::sqrt(dx * dx + dy * dy);
I tried reducing the amount of distance checks by checking the x and y difference first, but this also has a significant cost which didn't turn out to be worth the reduced number of distance calculation using the code above. I'm very open to suggestions on how to solve this, as it's a tempting feature, but as it stands this functionality can reduce the performance by 80-90% (!) as far as I can tell, making the simulation practically unusable.
What if we eliminated the additional sqrt() calls from the algorithm. We don't need the exact distance, just an approximation to see which virtual position is closest. Maybe a simple pre-calculated lookup table for sqrt results of integer inputs would suffice. Only if the rounded x in sqrt(x) is identical or let's say +/-1 we could do a sqrt() comparison. Please tell me if you want me to elaborate further..
Please elaborate further. This is my previous attempt at such a check, and doubled the frame time despite drastically reducing sqrt()
calls:
bool dist(int dx, int dy, int max_x, int max_y, int r) {
if (std::abs(dx) <= r && std::abs(dy) <= r) {
return true;
} else if ((std::abs(dx) - max_x <= r || std::abs(dx) - max_x >= -r) && (std::abs(dy) - max_y <= r || std::abs(dy) - max_y >= -r)) {
return true;
}
return false;
}
abs()
was as cheap as having the extra check for negative values. dx
and dy
is the result of p1 values minus p2 values.
I'm aware that this doesn't solve anything, you still need to calculate new dx/dy values for the attraction, it was merely to test the performance.
Such a lookup table could indeed save a lot if it's fast enough. It needs to contain the result of ~3.7 million comparisons if it's to be pixel accurate, but I'm guessing that can be reduced as well?
------ EDIT: IGNORE THIS, SEE POST BELOW -----------
Sorry my C++ is a bit rusty ;) Off the top of my head i would come up with this. I have to admit I'm not a great coder, so this might turn out to be stupid..
LOOKUP TABLE
int table_size= world.width * world.height;
float[] sqrt_lookup= new float[table_size];
for(int i= 0; i < table_size; i++)
sqrt_lookup[i]= std::sqrt(i * i)
REPLACE YOUR
const float dx = p1pos.x - p2pos.x;
const float dy = p1pos.y - p2pos.y;
const float r = std::sqrt(dx * dx + dy * dy);
WITH:
float virt_pos_x, virt_pos_y;
float smallest_dist= -1;
float current_contender= 0;
// loop through all 9 quadrants with the current screen space being
// quad_x= 0, quad_y= 0 ( treated no different)
for(int quad_x= -1; quad_x <= 1; quad_x++)
for(int quad_y= -1; quad_y <= 1; quad_y++)
{
// virtual position of p2 in current quadrant
float quad_pos_x= p2pos.x + quad_x * world.width;
float quad_pos_y= p2pos.y + quad_y * world.height;
float potential_contender;
float fast_dist= fast_dist_check(p1pos.x, p1pos.y, quad_pos_x, quad_pos_y, current_contender, out potential_contender);
if(smallest_dist == -1 || smallest_dist > fast_dist)
{
smallest_dist= fast_dist;
virt_pos_x= quad_pos_x;
virt_pos_y= quad_pos_y;
// we need to save the current contender x (as in sqrt(x))
// for the off-chance that we have to compare it against
// another x which is rounded to the same value
current_contender= potential_contender;
// i'll bet it's possible to break out of the loops early with something
// like this (assuming world.width >= world.height):
//
// if(smallest_dist < world.height / 2) break;
//
// but i can't prove it
}
}
const float dx = p1pos.x - virt_pos_x;
const float dy = p1pos.y - virt_pos_y;
const float r = std::sqrt(dx * dx + dy * dy);
float fast_dist_check(float x1, float y1, float x2, float y2, float contender, out float val)
{
float dx= x1 - x2;
float dy= y1 - y2;
val= dx * dx + dy * dy;
int val_rounded= Math.round(val);
// if we happen to look up the exact same item as our current_contender
// AND this x is smaller than the previously smallest x (as in sqrt(x))
// make sure this becomes the new smallest_dist by substracting a tiny
// amount
if(val_rounded == Math.round(contender) && val < contender))
float epsilon= 0.0001;
return sqrt_lookup[val_rounded] - epsilon;
return sqrt_lookup[val_rounded];
}
EDIT: Btw the size of the lookup table shouldn't make a difference from a performance standpoint, or does it? The memory use is negligible imho.
I already had a bad feeling about this and it turned out to be completely useless. First off, just to make sure we're on the same page: I was strictly talking about optimizing the code for repeating borders. Maybe there was a misunderstanding that i was trying to eliminate the mandatory sqrt() calls, which are needed for the exact distance between particles and used in the forces calculation.
Secondly, my replacement code didn't need lookup tables or anything. It should have been as simple as this:
float virt_pos_x, virt_pos_y;
float smallest_dist= -1;
// loop through all 9 quadrants with the current screen space being
// quad_x= 0, quad_y= 0 ( treated no different)
for(int quad_x= -1; quad_x <= 1; quad_x++)
for(int quad_y= -1; quad_y <= 1; quad_y++)
{
// virtual position of p2 in current quadrant
float quad_pos_x= p2pos.x + quad_x * world.width;
float quad_pos_y= p2pos.y + quad_y * world.height;
float dx= p1pos.x - quad_pos_x;
float dy= p2pos.y - quad_pos_y;
// just use dist^2 to compare distances, no need for square roots
float fast_dist= dx * dx + dy * dy
if(smallest_dist == -1 || smallest_dist > fast_dist)
{
smallest_dist= fast_dist;
virt_pos_x= quad_pos_x;
virt_pos_y= quad_pos_y;
// i'll bet it's possible to break out of the loops early with something
// like this (assuming world.width >= world.height):
//
// if(smallest_dist < world.height * world.height / 2) break;
//
// but i can't prove it
}
}
const float dx = p1pos.x - virt_pos_x;
const float dy = p1pos.y - virt_pos_y;
const float r = std::sqrt(dx * dx + dy * dy);
I hope you didn't spend to much time trying to figure out the brainfart in my last post ;)
Haha oh wow. I got the part about the mandatory sqrt part, but I must admit I spent a long time trying to make the lookup table work! The state I left it in last night was a real mess...
I will try your new solution right away, thanks again for your effort.
Update: It seems to be working. The performance takes a big hit, but it's usable if you reduce particle count, and you can now reduce the world size to match the lower particle count.
It's included in the most recent releases: https://github.com/NiclasEriksen/game-of-leif/releases/tag/v0.3.2-alpha
You enable it through the dropdown menu here:
I haven't ran it through the profiler yet, but I'm sure there are some areas where performance can be improved.
Thanks! I love the new UI, by the way.
That's the range of performance hit i was expecting/hoping for. There are some errors in the code though:
From line 222:
else if (BOUNDS_TYPE == BOUNDS_REPEATING) {
if ((p1pos.x - WORLD_SIZE.x) > 0) { p1->set_pos_x(0);}
else if (p1pos.x < 0) { p1->set_pos_x(WORLD_SIZE.y); }
else if ((p1pos.y - WORLD_SIZE.y) > 0) { p1vel.y *= -1; p1->set_pos_y(0);}
else if (p1pos.y < 0) { p1->set_pos_y(WORLD_SIZE.y); }
}
should be like this
else if (BOUNDS_TYPE == BOUNDS_REPEATING) {
if ((p1pos.x - WORLD_SIZE.x) > 0) { p1->set_pos_x(p1pos.x - WORLD_SIZE.x);}
else if (p1pos.x < 0) { p1->set_pos_x(p1pos.x + WORLD_SIZE.x); }
else if ((p1pos.y - WORLD_SIZE.y) > 0) { p1->set_pos_y(p1pos.y - WORLD_SIZE.y);}
else if (p1pos.y < 0) { p1->set_pos_y(p1pos.y + WORLD_SIZE.y); }
}
We want to preserve the "overflow" in the coordinates when the particle goes out of bounds.
I did some preliminary performance tests with sqrt lookup tables by the way. Seems they are about 2.5x faster (using g++). I will open a new issue when i finally get around to setting up c++ for godot and run some tests within the actual simulation code.
I'm embarrassed that I didn't notice it until now, didn't really test as well as I should have... bf307ce and updated the previous release with new binaries.
If we could transition entirely to lookup tables I'm going to be so elated. Are you suggesting to replace the sqrt
entirely?
Are you suggesting to replace the sqrt entirely?
Yes i am. My first try would be to have a table with a few million entries and then interpolate between them for increased precision.
But it's so hard to get a good read on how the simulation will be affected. Will there be less variety in the patterns if we lower the precision? How would we able to tell? What's our goal here?
Usually i avoid introducing changes that make CAs / Particle simulations "less deterministic". I would always have an option to start my simulations with a symmetric pattern of particles and if the symmetry broke at any point, in any random ruleset i considered it a failure. I was basically going for maximum randomness between rules but minimum randomness within a ruleset.
How would we able to tell?
We can compare the presets to the original simulation, but nothing there is written in stone. I've mostly kept up with the changes there, such as the addition of "viscosity", but I have refrained from adding the time scale as I felt it was going to drastically impact the behavior based on CPU speed (for the same reason you outline here).
We can also add a toggle function to swap between using the hash table and the regular sqrt
within the same run so we can see if the behavior changes immediately, and if a "stable" pattern destabilizes.
For the lower precision, I'm a little worried about it being jittery at low g
factors, but right now the simulation is full of longer "jumps" that are apparently up to 50-100 pixels at a time anyway that I don't think will be affected much. It doesn't feel that precise, even without the time scale mentioned above I still see roughly the same behavior when I import the presets from that repository, but again it's hard to tell. With many rulesets you have to rely on the initial distribution for them to emerge, requiring several retries.
interpolate between them for increased precision.
If that requires getting another result from the hash table, and using the remainder of the integer reduction as a weight, I can see this quickly diminishing your estimated 2.5x speed increase from above.
How about increasing the size of the hash table to effectively work at a higher resolution? A quarter pixel might be precise enough, as an example, if memory permits.
What's our goal here?
I personally enjoy the visual aspect of this and would gladly sacrifice some accuracy for more particles and smoother frame rates. The repeating bounds became an obvious priority now that I saw it in action, and I would love it if that became the primary environment for these patterns to emerge in as it gives a much more organic feeling.
I agree 100% with everything you said. Great to see that we're thinking along the exact same lines.
Unfortunately, the lookup table didn't speed up the original simulation. In fact, it made the performance significantly worse. And that's without interpolating, not even rounding. I wasn't that surprised to be honest. From what i read the sqrt() algorithm is so optimized that it maybe faster than accessing memory (don't quote me on that). I did the initial test runs that seemed to be promising on my notebook and used another compiler. Maybe this accounts for the discrepency. Also different use cases and program flow, so...
Another thing i tried was writing a similar particle interaction algorithm that doesn't use sqrt() calls at all. The closest i came so far resulted in way less interesting patterns and "only" doubled the performance. I doubt that i can come up with something other people haven't already thought of.