lightyear
lightyear copied to clipboard
Improve benchmark performance
Benchmarks show that it takes 1.3 ms to replicate 1000 entities (replicon takes 30us). Why?
With a lot of tracing spans, it's 3ms (because of the tracing overhead):
-
send_entity_spawn takes 530us
- ReplicationSend::prepare_entity_spawn is 85us (because we are allocating. The overall memory should be re-used though. But the hashmap inside the replication_group_id is probably very inefficient?)
-
send_component_update is 686us
- ReplicationSend::prepare_component_insert is 178us
- rest is probably iteration + serialization?
-
networking::send is 1.13ms
- buffer_replication_message is 790us
- finalize is 115us
- then there is serializing (not tracked)
- buffer_send_with_priority is 170us (buffering into the message manager)
- send_packets is only 335us
- message_manager.send_packets (that collects the messages to send from channels and builds the packets) is 190us
- buffer_replication_message is 790us
Also here are the ChannelSendStats
:
ChannelSendStats {
num_single_messages_sent: 1000,
num_fragment_messages_sent: 0,
num_bytes_sent: 27000,
},
- maybe it's not optimal to send a lot of individual messages, because we generate one MessageId per individual message?
- here the stats don't even take into account the MessageId, it's just the raw message bytes. 27 bytes seems pretty steep!
- 1-2 bits for ReplicationMessage (but it shouldn't be needed because we are in the EntityActions Channel!)
- group_id = u64 = 8 bytes
- 1 bit for Action vs Updates
- 2 bytes for MessageId (the "sequence id" for the replication group)
- the length of the vec: encoded with GammeEncoding so probably 1 byte. (here 2 bits)
- the entity: u64 so 8 bytes
- SpawnAction: I think only 2 bits?
- insert: 2 bits for the length, 2 bytes for the ComponentNetId, 4 bytes for the float
- remove: 1 bit for the length of the empty hashset
- updates: 1 bit for the length of the vec = 24 bytes + 11 bits = 26 bytes (I don't know where the extra 1 byte comes from). That's pretty steep. The main reason is that Entity is 8 bytes. Maybe we could gamma-code it as 2 tuples? since both the index and generation should be pretty low? Another reason is that we encode both ReplicationId and Entity which are the same here.
Potential ideas:
-
send_entity_spawn
- uses a double hashmap to store data. In particular the allocated memory of the second hashmap cannot be recovered!
-
networking_send
- buffer_replication_message
- we serialize twice because of bitcode quirks currently
- we allocate new EntityActions/EntityUpdates message instead of re-using existing ones
- serialize directly into a cursor without intermediate data structures? I'm not sure it's possible if we want to keep the ReplicationGroup guarantees, which replicon doesn't have Creating some entities ahead of time to re-use allocations in prepare_component_insert seems to bring a small (5%?) improvement. But what we can do is since we already buffered the per-replication data, the final message can be written to manually using a cursor
- they write all entity-actions into one message, which might become big and have to be split up (bad if packet-loss). We have one message per ReplicationGroup. That's also why they can write in a cursor efficiently: they write all the despawns (with entity), then all the removals (with entity), then all insertions (with entity)
- should we just update our message-packing?
- all the component updates for an entity are iterated through sequentially (which shouldn't make a diff for this benchmark) so they can be serialized directly in order?
- buffer_replication_message