DotRecast
DotRecast copied to clipboard
Performance with System.Numerics and Multithreading
First of all thanks for the port.
I refactored to use Vector3 instead of RcVec3f and the performance I had was like 4x faster on DtCrowd.
Parallelism also improved a lot.
Why is RcVec3f being used instead?
Compatibility Issues
System.Numerics.Vector3 leverages SIMD (Single Instruction, Multiple Data) instructions for optimized vector operations, which can be very fast for certain operations. However, SIMD operations can significantly vary in performance depending on the data structures and types of operations involved.
For example, using Vector3 may actually be slower for performing operations on small-sized vectors or simple scalar operations. Furthermore, SIMD extensions may not be supported on all hardware, making it unavailable in certain environments.
I couldn't find topics related to downsides, I could imagine very few case Vector3 would be slower but overall I thought it would be worth it.
Could you please provide your environment? Could you provide me with the DLL that you've built? Would you like to try running the build artifacts on a different CPU environment?
I edited my comment in case it was misinterpreted.
See info
- https://github.com/dotnet/runtime/issues/63354
I've added a new branch with a version that uses SIMD. While it needs testing on various architectures, it has already increased performance by over 50% on my current laptop. I'll continue with more R&D, and once I'm confident it's safe, I'll plan to merge it into the main branch.
@GabrielMotaAlexandre
Thanks, great news.
Hello folks!
Any updates on the SIMD support?
We are using it on the server side for an unannounced MMO and results are good with this port. However, we would like to indeed leverage SIMD on this.
On that subject - are DtCrowdAgent.RequestMoveTarget()/.AddAgent()/.RemoveAgent() thread-safe?
Thanks!
-
SIMD will be supported. However, we are currently working on fixing the SOH issue first.
-
By default, RecastNavigation is not thread-safe. The ported DotRecast is also not thread-safe. Therefore, when using it, use isolation or a Query pool to use multiple instances.
@galvesribeiro
Thanks for the reply @ikpil
We only use DtCrowd and NavQuery/NavMesh. In that case, I guess we should have a SemaphoreSlim(1,1) being used whenever we need to call Add/Remove agent and Update.
Is that enough to protect the Write operations but Read from any thread without "lock"?
Again, thanks for the great work on this port!
I haven't tested it, but just take a look at the feeling. @galvesribeiro
public class DtCrowdManager
{
private DtCrowd _crowd;
private ConcurrentQueue<Action> _requests;
public DtCrowdManager(DtCrowd crowd)
{
_crowd = crowd;
_requests = new ConcurrentQueue<Action>();
}
// DtCrowdAgent - should only read.
// AddAsync - thread-safe
public Task<DtCrowdAgent> AddAsync(RcVec3f pos, DtCrowdAgentParams option)
{
var tcs = new TaskCompletionSource<DtCrowdAgent>(TaskCreationOptions.RunContinuationsAsynchronously);
_requests.Enqueue(() =>
{
var ag = _crowd.AddAgent(RcVec3f.Zero, null); // ..
tcs.SetResult(ag);
});
return tcs.Task;
}
// RemoveAsync - thread-safe
public Task<bool> RemoveAsync(DtCrowdAgent ag)
{
var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);
_requests.Enqueue(() =>
{
_crowd.RemoveAgent(ag);
tcs.SetResult(true); // ...
});
return tcs.Task;
}
// It should be called only from one thread.
public void Update(float dt)
{
while (_requests.TryDequeue(out var action))
{
action.Invoke();
}
_crowd.Update(dt, null);
}
}
ConcurrentQueue does a hard lock internally. So the Enqueue() call is essentially blocking the thread.
We use DotRecast in a server which is based on Microsoft Orleans. Locks in that context are extremely harmful. So I guess in our case the SemaphoreSlim would be better.
Any update on this? I'm not very concerned about the performance difference between vector types, but having to convert all of my vectors anytime I interact with this library makes it a lot more painful than necessary. Most game- and graphics-related libraries are using System.Numerics at this point, and it can't be overstated how much more convenient things are when everything is consistent.
Any update on this? I'm not very concerned about the performance difference between vector types, but having to convert all of my vectors anytime I interact with this library makes it a lot more painful than necessary. Most game- and graphics-related libraries are using System.Numerics at this point, and it can't be overstated how much more convenient things are when everything is consistent.
In conclusion, I plan to make the change around November 2024 when dotnet 6 support ends.
The issue here is a crash occurring due to memory corruption during SIMD operations in a specific environment.
Here's how I tested it:
- I built a DLL with dotnet 6 in release mode on Windows 64-bit with an AMD Ryzen 5600x.
- I copied this DLL to Hyper-V rocky 9 Linux 64-bit on Windows 64-bit and ran it.
- At some point during specific SIMD operations, a crash occurred.
- Upon investigation, I found out that the crash happened during Vector3 SIMD operations.
So, I switched to using RcVec3f.
Here are similar reported issues:
- https://github.com/dotnet/runtime/issues/63354
- https://github.com/dotnet/runtime/pull/70141
The work to make the change is already completed, and I'm concerned that if we switch now, there might still be people using early versions of dotnet 6 who could experience crashes. ๐
ํ์ฌ๊น์ง ์ ๋ฐ์ดํธ๋ฅผ ์ ํ๊ณ ์๋ ํ๋ก์ ํธ๋ฅผ ์ด์ํ์ ์ ์ ์๊ฒ๋ ๋๋ฌด ํฐ ๊ณต๋ถ๊ฐ ๋์ด์ ๊ฐ์ฌ๋๋ฆฌ๋ฉฐ, ์๋ฒ๋ด ์ฌ์ฉ์ ์ํ, ๋ฉํฐ์ฐ๋ ๋ ๊ตฌ์กฐ ๊ด๋ จํด์ ์ถ๊ฐ์ ์ผ๋ก ํ๋ ์ฌ์ญ๊ณ ์ถ์ต๋๋ค.
As we use the server, entities (agents) are managed through space partitioning (quadtree or sectors). The update tick itself is called for each partitioned space, and agents are updated accordingly.
In the DtCrowd code and the code you provided, all agents are processed through GetActiveAgents within DtCrowd.
If agents are partitioned by space, what would be the best approach to handle this?
I would like to ask if it is structurally feasible to override DtCrowd's Update or GetActiveAgents to handle target agents.
Additinally, I found your another post in Unity Forum. you replied for similar question as this comment.
The first approach is to implement it directly.
- I used this method because I have a lot of monsters. Another approach is to use a Crowd Manager with partitioning.
- If the partitions are well-defined, you can even run it with multiple threads.
It seems your approach would be directly method without DtCrowd. Isn't it?
And you mentioned about partitioning for multi threads,
Could you explain more about both approaching?
If agents are partitioned by space, what would be the best approach to handle this?
Could you provide more details on what it means for agents to be partitioned by space?
If agents are partitioned by space, what would be the best approach to handle this?
Could you provide more details on what it means for agents to be partitioned by space?
์์ฌ์ ๋ฌ์ ์ํด, ์กฐ๊ธ๋ ์์ธํ ์ํฉ์ค๋ช ์ ํด๋๋ฆฌ๊ฒ ๋ฉ๋๋ค(ํ๊ธ๋ก ์ฐ๊ณ ๋ฒ์ญ์ ํ๊ฒ ๋ฉ๋๋ค)
-
Entity: ์๋ฒ์ ์์ง์ด๋๊ฐ์ฒด์ ๋จ์์ด๋ฉฐ, NavMesh Agent๋ฅผ ์์ ํ๊ณ ์์ต๋๋ค.
-
Field : ํ๋์ NavMesh ๋งต์ ๊ฐ์ง๋ฉฐ, Sector ๋ฆฌ์คํธ๋ฅผ ๊ฐ์ง๊ณ ์๋ ๋จ์.
-
Sector : Entity๊ฐ์ฒด๊ฐ ์ํ ๋ ผ๋ฆฌ์ ์ผ๋ก ์ผ์ ๊ฐ๊ฒฉ(10m๋จ์) ๋๋์ด์ ธ์๋ Grid(Tile๊ณผ ๋น์ท) - ์๋ฒ ๋ธ๋ก๋์บ์คํ ๋จ์
ํ๋์ ๋ค๋น๋ฉ์๋ฅผ ํฌํจํ Field๋ ํน์ ๊ฐ๊ฒฉ(10m)์ผ๋ก Sector๋ณ๋ก ๊ณต๊ฐ๋ถํ ์ด ๋๋ฉฐ, Entity๋ Position์ ๋ฐ๋ผ์ Sector์ ์ํ๊ณ ์์ต๋๋ค.(Position์ ๋ฐ๋ผ Sector๊ฐ ์ด๋๋จ) Entity์ Position์ Entity๊ฐ ๊ฐ์ง๊ณ ์๋ CrowdAgent(Reference)์ Position๊ณผ ๊ฐ์ต๋๋ค(์ฐธ์กฐ๋ก ๊ฐฑ์ ํฉ๋๋ค)
์๋ฒ Sector๋ค์ Updateํจ์ ํธ์ถ์ Entity๊ฐ ์ํฅ๋ฒ์๊ฐ ์๋ ๊ฑฐ๋ฆฌ์ ๋ฐ๋ผ, ์ ๋นํ ์ค์ผ์ฅด๋ง๋์ด ๋์์ Multi Thread Task์์ ํธ์ถํ๊ณ ์์ต๋๋ค.
ex) 1๋ฒ๋ถํฐ 100๋ฒ๊น์ง์ Sector๊ฐ ์๋์ํฉ์์, 10๊ฐ์ Thread Task๊ฐ ๊ฐ์ํฑ์์ ์๋ก๊ฐ ์ํฅ์๋ Sector๋ค์ Update ํจ์๋ฅผ ํธ์ถ. ํธ์ถ๋ Sector๋ Updateํจ์ ๋ด์์ ์์ ์ ์์ญ์ ์๋ Entity์ Update ํจ์๋ฅผ ํธ์ถํ๋ฉฐ ๊ฐ์ข ์ฒ๋ฆฌ๋ฅผํฉ๋๋ค.
๋ค๋ง ํ์ฌ์ DtCrowd๋ฅผ ๋ณด๋ฉด, ํ๋์ DtCrowd๊ฐ์ฒด(Agent ๋ฆฌ์คํธ๋ฅผ ํฌํจ)์์ ํ๋ฒ์ updateํฑ์์ ๋ก์ง์ํ์ ํ๊ณ ์์ด์ ์ ์ ๊ฐ์ด ์นํฐ๋ณ๋ก(๊ฐ์ํฑ์ ์ํฅ๋ฒ์๊ฐ ์๋ก ์๋ Agent๋ผ๋ฆฌ) Updateํฑ์ด ํ์ํ ์ํฉ์์๋ DtCrowd์ Agent๋ค์ ์ ๋ฐ์ดํธ๋ฅผ ์ชผ๊ฐค ์๊ฐ ์์ด ๋ณด์ฌ์ ์ง๋ฌธ์ ๋๋ ธ์์ต๋๋ค.
Agent๋ค์ ์ํธ๊ฐ ์ฒ๋ฆฌ๋ฅผ ์์ฐ์ค๋ฝ๊ฒ ํ๋ DtCrowd์ ๋ง์๊ธฐ๋ฅ(Steering, Collision)์ Multi Thread์์ ์ธ ์ ์์๊น ๊ณ ๋ฏผํ๋ ์ฐฐ๋์์๊ณ ,
๋ง์ฝ ํ๋์ DTNavMesh์์ DtCrowd์ MultiThread ๋จ์ํ ์ ๊ทผ์ด ํ๋ค๋ค๊ณ ํ๋ฉด, ์ ๋ ๋ค์๊ณผ ๊ฐ์ ์ ํ์ ํด์ผ ํ ๊ฒ ๊ฐ์ต๋๋ค.
- ํ๋์ Field๋น ํ๋์ Thread๋ก ๊ฐ์ ํ์ฌ, DtCrowd๊ฐ์ฒด์ update ํธ์ถ์ ํด์ ์ฐ๊ฑฐ๋
- Agent๊ฐ ์ถฉ๋์ฒ๋ฆฌ๋ ํ์ง ์๊ณ , ํ์์(๋ชฌ์คํฐ๊ธธ์ฐพ๊ธฐ์ ๋) ๊ทธ๋๊ทธ๋ FindPath๋ง Multi Thread๋ก ์ฌ์ฉํ๋ค. (ํ๋์ Field-DTNavMesh์์ DtNavMeshQuery ๋ง ThreadTask๋ณ๋ก ์ฌ๋ฌ๊ฐ ์์ฑํด์ ๊ฐ Thread๋ณ๋ก ์ฟผ๋ฆฌ์ฒ๋ฆฌ๋ฅผ ํ๋ ค๊ณ ํฉ๋๋ค)
์์ Idea์ ๋ํ ๊ฒ์ ์ ๊ฐ ํ์ฌ๊น์ง ์๊ฐํด๋ณธ ๋ด์ฉ์ด์ง๋ง, ์๋ฒ์์์ MultiThread ์ฒ๋ฆฌ๋ ํน์ ๋๊ท๋ชจ ์ ์ ๊ฐ ์ ์์ด ์๋ MMORPG๋ฅผ ๊ธฐ์ค์ผ๋ก ๊ผญ MultiThread๊ฐ ์๋๋๋ผ๋ ํจ์จ์ ์ธ DotRecast&Detour ์ฌ์ฉ ๊ฐ๋ฅํ ์๋๋ฆฌ์ค๊ฐ ์๋ค๊ณ ํ๋ฉด ์กฐ์ธ์ด๋ ์์ด๋์ด์ ๋ํด์ ์๊ฐ์ ๊ณต์ ํด์ฃผ์๋ฉด ์ ๋ง ๊ฐ์ฌ๋๋ฆฌ๊ฒ ์ต๋๋ค.
To facilitate communication, let me provide a more detailed explanation of the situation.
-
Entity: An entity is a unit of a moving object on my server and owns a NavMesh agent.
-
Field: A unit that has one NavMesh map and a list of sectors.
-
Sector: A logically divided grid tile (10m intervals) where an entity belongs.
A Field containing a NavMesh is spatially partitioned by sectors at specific intervals, and entities belong to sectors based on their positions. The position of an entity is the same as the position of the CrowdAgent (reference) it holds (referenced).
The server's update function is scheduled and called simultaneously in a multi-threaded task in an order that ensures no influence range between entities within the grid.
For example, in a situation with sectors numbered from 1 to 100, 10 thread tasks call the update functions of sectors without influence range. Within the sectorโs update function, it calls the update function of entities in its area.
Currently, DtCrowd performs logic in a single update tick with one DtCrowd instance (including a list of agents). In a situation like mine, where an update tick is needed for each sector (agents without influence range in the same tick), it seems impossible to split the update of agents in DtCrowd. Hence, I am asking for advice.
I was considering whether it is possible to use many of DtCrowd's features (steering, collision) in a multi-threaded environment, given that they handle agent interactions naturally.
If it is difficult to achieve simple multi-threaded access with DtCrowd in a single DTNavMesh, I might need to make the following choices:
- Call the DtCrowd object's update in a single thread per field.
- Handle agent collisions separately and use FindPath in a multi-threaded manner only when needed. (I plan to create multiple DtNavMeshQuery instances in each thread task within a single Field-DTNavMesh and process queries in each thread.) These are the ideas I have considered so far. If there are any scenarios or ideas you could share, whether multi-threaded or not, based on handling large-scale user interactions or wars in MMORPGs on a server, I would greatly appreciate your thoughts and advice.
์๋ฒ์์์ MultiThread ์ฒ๋ฆฌ๋ ํน์ ๋๊ท๋ชจ ์ ์ ๊ฐ ์ ์์ด ์๋ MMORPG๋ฅผ ๊ธฐ์ค์ผ๋ก ๊ผญ MultiThread๊ฐ ์๋๋๋ผ๋ ํจ์จ์ ์ธ DotRecast&Detour ์ฌ์ฉ ๊ฐ๋ฅํ ์๋๋ฆฌ์ค๊ฐ ์๋ค๊ณ ํ๋ฉด
DtCrowd ์์ ๋ ๋ง์ ๊ธฐ๋ฅ๋ค์ด ์ด๋ป๊ฒ ์ํธ ์ฐ๊ฒฐ๋๋์ง๋ฅผ ๋ณด์ฌ์ฃผ๋ฉฐ, ์ค๋ ๋ ์์ ์ฑ(thread-safety)์ ๊ณ ๋ คํ์ง ์์์ต๋๋ค. ๊ตฐ์ง ์ฒ๋ฆฌ๋ฅผ ๋ฉํฐ ์ค๋ ๋๋ก ๊ฐ๋ฐํ๋ ๋ฐฉ๋ฒ์ ๋ง์ง๋ง, ์ ๊ฐ ์๊ณ ์๋ ๋ฐฉ๋ฒ์ @kaoraswoo ๋๊ป์ ์ธ๊ธํ ๋ด์ฉ์์ ํฌ๊ฒ ๋ฒ์ด๋์ง ์์ต๋๋ค.
์ ์ฒด์ ์ผ๋ก ๋ ๊ฐ์ง ์ฃผ์ ์ ๊ทผ ๋ฐฉ์์ด ์์ต๋๋ค:
์์ด์ ํธ ๋ถํ 1-1. ์์ด์ ํธ๋ฅผ ํ๋์ ํ ๋นํ๊ณ , ํ๋๋ฅผ ๊ฒฉ๋ฆฌํ์ฌ ๋ฉํฐ ์ค๋ ๋๋ก ์ฒ๋ฆฌ. 1-2. ์์ด์ ํธ๋ฅผ ๋์ ์ผ๋ก ๊ตฐ์ง์ผ๋ก ๋ฌถ๊ณ , ๊ตฐ์ง์ ๊ฒฉ๋ฆฌํ์ฌ ๋ฉํฐ ์ค๋ ๋๋ก ์ฒ๋ฆฌ. 1-3. ์์ด์ ํธ๊ฐ ๋ค๋ฅธ ์์ด์ ํธ๋ฅผ ์ฝ๊ธฐ๋ง ํ๊ณ , ์์ด์ ํธ๋ฅผ ๊ฒฉ๋ฆฌํ์ฌ ๋ฉํฐ ์ค๋ ๋๋ก ์ฒ๋ฆฌ. 1-4. 1์์ 100๊ฐ์ ์์ด์ ํธ๊ฐ ์์ ๋, 1-10, 11-20 ๋ฑ์ผ๋ก ๊ทธ๋ฃนํํ์ฌ ๋ฉํฐ ์ค๋ ๋๋ก ์ํ (์ผ๋ถ ์ค์ฐจ ํ์ฉ).
๊ธฐ๋ฅ ๋ถํ 2-1. ๊ฒฉ๋ฆฌ ๊ฐ๋ฅํ ๊ธฐ๋ฅ๋ค์ ๋ฉํฐ ์ค๋ ๋๋ก ์ฒ๋ฆฌ.
๊ฐ๋ฐ์ ํธ์์ฑ๊ณผ ์์ฐ์ฑ ์ธก๋ฉด์์ 1-1๊ณผ 1-4๊ฐ ๊ฐ์ฅ ์ฝ์ต๋๋ค. ์ ๋ ์ฃผ๋ก 1-1 ๋ฐฉ๋ฒ์ ์ฌ์ฉํ์ฌ ๊ฐ๋ฐํ๊ณ ํ๋กํ์ผ๋ฌ๋ฅผ ์ด์ฉํด ๊ฐ์ ํฉ๋๋ค.
2-1์ ์ถ๊ฐ ์ฐ๊ตฌ๊ฐ ํ์ํฉ๋๋ค.
๋ถ์กฑํ ๋ด์ฉ์ด์ง๋ง ๋์์ด ๋์ จ์ผ๋ฉด ํฉ๋๋ค.
The DtCrowd example shows how many features are interconnected, and it does not consider thread-safety.
There are many ways to develop crowd processing with multi-threading, but the methods I know do not deviate significantly from what @kaoraswoo mentioned. Overall, there are two main approaches:
Agent Splitting 1-1. Assign agents to fields, isolate the fields, and process using multi-threading. 1-2. Dynamically group agents into clusters, isolate the clusters, and process using multi-threading. 1-3. Agents only read other agents, isolate agents, and process using multi-threading. 1-4. If there are 1 to 100 agents, group them into 1-10, 11-20, etc., and iterate through the groups using multi-threading (allowing some error).
Functionality Splitting 2-1. Process isolated functions using multi-threading.
In my experience, for development convenience and productivity, 1-1 and 1-4 are the easiest. I mainly develop using 1-1 and improve it using a profiler.
2-1 requires further research.
Although the content is lacking, I hope it was helpful.
Another thing I had done which improves performance and readability was converting float collections to Vector3 collections. EG