lightyear icon indicating copy to clipboard operation
lightyear copied to clipboard

Improve benchmark performance

Open cBournhonesque opened this issue 8 months ago • 8 comments

Benchmarks show that it takes 1.3 ms to replicate 1000 entities (replicon takes 30us). Why?

With a lot of tracing spans, it's 3ms (because of the tracing overhead):

  • send_entity_spawn takes 530us

    • ReplicationSend::prepare_entity_spawn is 85us (because we are allocating. The overall memory should be re-used though. But the hashmap inside the replication_group_id is probably very inefficient?)
  • send_component_update is 686us

    • ReplicationSend::prepare_component_insert is 178us
    • rest is probably iteration + serialization?
  • networking::send is 1.13ms

    • buffer_replication_message is 790us
      • finalize is 115us
      • then there is serializing (not tracked)
      • buffer_send_with_priority is 170us (buffering into the message manager)
    • send_packets is only 335us
      • message_manager.send_packets (that collects the messages to send from channels and builds the packets) is 190us

Also here are the ChannelSendStats:

ChannelSendStats {
        num_single_messages_sent: 1000,
        num_fragment_messages_sent: 0,
        num_bytes_sent: 27000,
},
  • maybe it's not optimal to send a lot of individual messages, because we generate one MessageId per individual message?
  • here the stats don't even take into account the MessageId, it's just the raw message bytes. 27 bytes seems pretty steep!
    • 1-2 bits for ReplicationMessage (but it shouldn't be needed because we are in the EntityActions Channel!)
    • group_id = u64 = 8 bytes
    • 1 bit for Action vs Updates
    • 2 bytes for MessageId (the "sequence id" for the replication group)
    • the length of the vec: encoded with GammeEncoding so probably 1 byte. (here 2 bits)
    • the entity: u64 so 8 bytes
    • SpawnAction: I think only 2 bits?
    • insert: 2 bits for the length, 2 bytes for the ComponentNetId, 4 bytes for the float
    • remove: 1 bit for the length of the empty hashset
    • updates: 1 bit for the length of the vec = 24 bytes + 11 bits = 26 bytes (I don't know where the extra 1 byte comes from). That's pretty steep. The main reason is that Entity is 8 bytes. Maybe we could gamma-code it as 2 tuples? since both the index and generation should be pretty low? Another reason is that we encode both ReplicationId and Entity which are the same here.

Potential ideas:

  • send_entity_spawn

    • uses a double hashmap to store data. In particular the allocated memory of the second hashmap cannot be recovered!
  • networking_send

    • buffer_replication_message
      • we serialize twice because of bitcode quirks currently
      • we allocate new EntityActions/EntityUpdates message instead of re-using existing ones
      • serialize directly into a cursor without intermediate data structures? I'm not sure it's possible if we want to keep the ReplicationGroup guarantees, which replicon doesn't have Creating some entities ahead of time to re-use allocations in prepare_component_insert seems to bring a small (5%?) improvement. But what we can do is since we already buffered the per-replication data, the final message can be written to manually using a cursor
      • they write all entity-actions into one message, which might become big and have to be split up (bad if packet-loss). We have one message per ReplicationGroup. That's also why they can write in a cursor efficiently: they write all the despawns (with entity), then all the removals (with entity), then all insertions (with entity)
      • should we just update our message-packing?
      • all the component updates for an entity are iterated through sequentially (which shouldn't make a diff for this benchmark) so they can be serialized directly in order?

cBournhonesque avatar May 30 '24 04:05 cBournhonesque