fau
fau copied to clipboard
quadtree improvement
It may be the waste of time but I attempted to improve the current quadtree implementation (basically absolute rework of something unfinished). Though I choose rather simple path of relying on arc a lot. This can be fixed but that is also utter waste of time if pr is not accepted at the end.
I also modified a fmath while searching for methods I needed for quadtree (little code clean up and bug fix (i just happened to spot it)).
You can find all reasoning in code.
It looks like this quadtree is optimized specifically for moving already-inserted objects, and I don't think this should be part of the "standard" quadtree implementation. Have you tested whether it is significantly faster than rebuilding the whole tree every update?
I did not realize you are actually rebuilding the tree each frame. Yes it is faster.
reasoning
Entities does not change quadrant very often and update does mostly noting When it does though its usually changing the quadrant. In your implementation, whenever you split, you reinsert all objects in the quadrant. This makes quadtree update nonlinear as some objects are processed multiple times. My update method tries to loop trough neighbours or the node and assign entity to it.
I have to admit my code is optimized for mindustry as you can insert buildings just once and don't update them for a livetime. I did not expect fau to be more then mindustry engine.
Its also true that what really takes lot of time is Looping trough results of intersect
. That's why I bring concept of groups (teams). I already did this change before in my own game and performance bonus was Decent.
proof
here you can find a go implementation of what I imagine is a perfect quadtree. The example of running it with big amount of entities is here.
I have to admit my code is optimized for mindustry as you can insert buildings just once and don't update them for a livetime.
Can't the current implementation support that just as well? Updating doesn't matter for stationary objects.
Its also true that what really takes lot of time is Looping trough results of intersect. That's why I bring concept of groups (teams).
I don't think this should be part of the quadtree. The Java version of mindustry just has a separate quadtree for each team. Most games will not need multiple groups in one quadtree.
Can't the current implementation support that just as well? Updating doesn't matter for stationary objects.
As you stated you are cleaning quadtree each frame and reinserting all objects. If I understand correctly that means you insert each building every frame. When you preserve state of quadtree you have to insert building upon creation and delete it upon destruction which hardly happens. That's where the improvement is.
I don't think this should be part of the quadtree. The Java version of mindustry just has a separate quadtree for each team. Most games will not need multiple groups in one quadtree.
I actually agree, solving this by replacing complicated group management with multiple quadtrees seems better and it might also be as efficient. You have to loop trough group or loop trough more nodes.
i should probably study the java codebase bit more before i make blank statements
As you stated you are cleaning quadtree each frame and reinserting all objects. If I understand correctly that means you insert each building every frame.
Only for moving objects, like units and bullets. Buildings do not work this way; I only insert/remove them when the building is created/destroyed.
I see so you have 2 quadtrees for each team, one for static things and one for dynamic.
Entities does not change quadrant very often and update does mostly noting When it does though its usually changing the quadrant. In your implementation, whenever you split, you reinsert all objects in the quadrant. This makes quadtree update nonlinear as some objects are processed multiple times.
What about this. I can probably make a simulation to test this in nim just to verify if it actually helps. Make some objects to move around and see how long it takes for both implementations to run.
Also I earlier sent you and already hinted that quadtree can be implemented without recursive structure. Is it reasonable?
@Anuken I conducted a benchmarks and here are results:
# current implementation
CPU Time [old] 7.625s
# current implementation but optimized by me
CPU Time [optimized] 6.240s
# updating approach presented in this pr
CPU Time [updating] 0.739s
CPU Time [updating with moving objects] 1.047s
for reference here is the benchmark code:
import times, random, strutils, quadtree_old, quadtree_updatable
template benchmark(benchmarkName: string, code: untyped) =
block:
let t0 = epochTime()
code
let elapsed = epochTime() - t0
let elapsedStr = elapsed.formatFloat(format = ffDecimal, precision = 3)
echo "CPU Time [", benchmarkName, "] ", elapsedStr, "s"
let
a = newQuadtree[Rect](rect(0, 0, 10000, 10000))
b = quadtree_old.newQuadtree[Rect](rect(0, 0, 10000, 10000))
c = quadtree_updatable.newQuadtree[int](rect(0, 0, 10000, 10000))
var objects = newSeq[Rect](10000)
for o in objects.mitems:
o = rect(rand(0f32..10000f32), rand(0f32..10000f32), 0, 0)
benchmark "old":
for i in 0..1000:
for o in objects:
b.insert(o)
b.clear()
benchmark "optimized":
for i in 0..1000:
for o in objects:
a.insert(o)
a.clear()
var quads = newSeq[quadtree_updatable.Quadtree[int]](10000)
for i, o in objects:
quads[i] = c.insert(o, i)
benchmark "updating":
for i in 0..1000:
for i, o in objects:
quads[i] = quads[i].update(o, i)
type Dummy = object
pos, vel: Vec2
quad: quadtree_updatable.Quadtree[int]
var dummies = newSeq[Dummy](10000)
for d in dummies.mitems:
d.pos = vec2(rand(2000f32..8000f32), rand(2000f32..8000f32))
d.vel = vec2l(rand(0f32..pi2.float32), rand(0f32..0.5f32))
let
d = quadtree_updatable.newQuadtree[int](rect(0, 0, 10000, 10000))
e = newQuadtree[Rect](rect(0, 0, 10000, 10000))
for i, dum in dummies.mpairs:
dum.quad = d.insert(rect(dum.pos.x, dum.pos.y, 0, 0), i)
benchmark "updating with moving objects":
for i in 0..1000:
for i, d in dummies.mpairs:
d.quad = d.quad.update(rect(d.pos.x, d.pos.y, 0, 0), i)
d.pos += d.vel
That looks good - why close the PR?
The quadtree can still be improved. Currently we are relying on arc and storing references to quadtree all over the place. My plan is to change the tree structure. Replace pointers with indexes and store all nodes in continuous memory block. The advantage is that we have to maintain just one pointer. Disadvantage is that reallocating node storage can produce lag if tree gets too rundown. Second issue can be solved by reusing closed nodes in different places though code will get complex.
Are there any reference implementations that store quadtree leaves in a continuous memory block?
here is one. Its optimized to play with gc, there isn't any other reason. Current implementation also creates reference cycles, that can be avoided.