bevy
bevy copied to clipboard
Make scene handling of entity references robust
Objective
- Handle dangling entity references inside scenes
- Handle references to entities with generation > 0 inside scenes
- Fix a latent bug in
Parent
'sMapEntities
implementation, which would, if the parent was outside the scene, cause the scene to be loaded into the new world with a parent reference potentially pointing to some random entity in that new world. - Fixes #4793 and addresses #7235
Solution
- DynamicScenes now identify entities with a
Entity
instead of a u32, therefore including generation -
World
exposes a newreserve_generations
function that despawns an entity and advances its generation by some extra amount. -
MapEntities
implementations have a newget_or_reserve
function available that will always return anEntity
, establishing a new mapping to a dead entity when the entity they are called with is not in theEntityMap
. Subsequent calls with that sameEntity
will return the same newly created dead entity reference, preserving equality semantics. - As a result, after loading a scene containing references to dead entities (or entities otherwise outside the scene), those references will all point to different generations on a single entity id in the new world.
Changelog
Changed
- In serialized scenes, entities are now identified by a u64 instead of a u32.
- In serialized scenes, components with entity references now have those references serialize as u64s instead of structs.
Fixed
- Scenes containing components with entity references will now deserialize and add to a world reliably.
Migration Guide
-
MapEntities
implementations must change from a&EntityMap
parameter to a&mut EntityMapper
parameter and can no longer return aResult
. Finally, they should switch from callingEntityMap::get
to callingEntityMapper::get_or_reserve
.
Much better docs, and I really like the closure based API. Why not bake this in directly to
EntityMap
? It's unclear when you would be okay to forgo the guarantees provided here.
I'm not sure I see a good way to do that that both preserves safety and doesn't make the API more awkward. Baked into EntityMap
we either have an extra bit of state (has a dead_start
or doesn't, making dead_start
have to be Option<Entity>
) or we push the initialization and saving out to code setting up the EntityMap
. In the first case, the result would be get_or_alloc
either panicking or returning Option<Entity>
instead of entity. In the second we can't use the closure approach to ensure safety anymore. Also in the first case, it becomes possible to bypass with_mapper
.
Making this controversial because this is taking scene ids in a new-ish direction. Scenes should have "virtual self contained identity spaces" that don't (necessarily) correspond to real world entities. Introducing generation (especially for "top level" entities in the scene) makes the scene format longer / harder to read and compose (just look at the diffs in the scene files). Generation isn't a useful part of the of "scene mental model". Introducing generation also encourages people to think about "runtime ids" as the same thing as scene ids.
There is definitely an issue in the current approach when it comes to uniqueness. ( id: 4, gen: 0)
and (id: 4, gen: 1)
should not be considered the same thing. And dangling ids should be accounted for.
Some alternative ideas to consider:
- Serialize scene entity ids as u64s (laid out as
[id_u32][generation_u32]
). Treat that as just a simpler encoding of(id: u32, generation: u32)
- When serializing, just create a brand new
Entity->SceneId
mapping and allocate new SceneIds as new entities are encountered.
Curious what the reflection SMEs think: @MrGVSV @jakobhellermann
- When serializing, just create a brand new Entity->SceneId mapping and allocate new SceneIds as new entities are encountered.
This is very difficult to do because we currently can't look inside components with entity references when serializing, and so we can't allocate SceneIds for dangling references nor serialize component entity references as SceneIds. It would be a much, much larger change to start doing that. The entities in dangling references are only encountered when adding a deserialized scene to the world and no sooner (after both serialization and deserialization).
- Serialize scene entity ids as u64s (laid out as
[id_u32][generation_u32]
). Treat that as just a simpler encoding of(id: u32, generation: u32)
Amusingly, this is actually what the initial version of this PR did (well, almost: it was [generation_u32][id_u32]
since it used the existing Entity::to_bits()
). Check out the now-resolved thread off the first comment alice left on dynamic_scene.rs
.
I think it makes sense for Entity
to serialize like this everywhere, so that both component entity references and the top-level entity IDs serialize as u64s. Though, I was going to put that into a separate PR.
I went ahead and added a commit that goes back to serializing entity identifiers as u64s, but via changing the implementation of Serialize
and Deserialize
on Entity
instead of the earlier approach via holding a u64 in DynamicEntity
. This way, the API improvements in DynamicScene
are retained and component entity references are also similarly affected.
I went ahead and added a commit that goes back to serializing entity identifiers as u64s, but via changing the implementation of
Serialize
andDeserialize
onEntity
instead of the earlier approach via holding a u64 inDynamicEntity
. This way, the API improvements inDynamicScene
are retained and component entity references are also similarly affected.
Well, as cart explained above and I coincidentally tried to explain on Discord, using Entity
instead of u64
is confusing because it makes it look like we somewhat care about the exact Entity
value. We don't; we just need a unique identifier, and keeping u64
is more clear in my opinion.
This is very difficult to do because we currently can't look inside components with entity references when serializing, and so we can't allocate SceneIds for dangling references nor serialize component entity references as SceneIds.
Can't we bake that logic into the serialization implementation of Entity
?
Can't we bake that logic into the serialization implementation of Entity?
Not without very significant complexity and tradeoffs, as far as I can tell. To map within serialization would require extra state on the serializer used within Entity
in order to store the mapping table, which in turn means that we couldn't implement the serde
traits. That would have major fallout, causing any component with an Entity
in it to also be unable to implement Serialize
or Deserialize
and bevy to basically have to roll its own serialization. This is made especially worse because components don't need to be serialized directly via bevy reflection, they can delegate to Serialize
/Deserialize
.
The alternatives would be using thread-local statics inside the serde implementations on Entity
, which feels like a dangerous hack, or perhaps using something similar to https://crates.io/crates/serde_state (which itself unfortunately appears to be an unmaintained fork for serde). Serde itself has DeserializeSeed
which is stateful, but it doesn't look like it's quite suited for this use case.. Regardless it's a much bigger change.
Also, if we bake this into the serialization then DynamicEntity
is still gonna contain an Entity
instead of a u64
, since the mapping to a new ID space would happen after it's constructed.
patching after serialization.
We support several different serialized formats (ron, messagepack, postcard, bincode). This seems nightmarish.
Look like we somewhat care about the exact Entity value. We don't;
Currently, we actually do. The components contained in that Vec<Box<dyn Reflect>>
just underneath pub entity: Entity
in DynamicEntity
can contain Entity
s inside them, and they have to exactly match the top-level entity identifiers in the scene.
I'm not a huge fan of doubling the size of all serialized entities just to guarantee unicity inside a serialized blob.
If worlds can contain u32::MAX
entities with u32::MAX
generations on each of them, then you need just as many bits in the scene format if you want to be able to save every possible world.
which in turn means that we couldn't implement the serde traits.
We already have the problem with deserialization though, where we need the type registry for example to correctly deserialize dynamic objects. So are we gaining much by restricting ourselves in the serialization side only?
We support several different serialized formats (ron, messagepack, postcard, bincode). This seems nightmarish.
I didn't mean patching the final blob, but rather the intermediate memory representation in serde's data model, which is independent of the format. But maybe that's not possible?
Currently, we actually do. The components contained in that Vec<Box<dyn Reflect>> just underneath pub entity: Entity in DynamicEntity can contain Entitys inside them, and they have to exactly match the top-level entity identifiers in the scene.
As far as I understand that's incorrect, the serialized representation of the Entity
reference in the serialized component has to match the one of the serialized Entity
itself. But nothing forces that representation to be (id, generation)
. It could be any unique identifier, even a linearly allocated index. It's just that [id][generation]
is a convenient candidate because it can be derived from the Entity
without further context. A linearly allocated index or a GUID would require a stateful serialization.
If worlds can contain u32::MAX entities with u32::MAX generations on each of them, then you need just as many bits in the scene format if you want to be able to save every possible world.
But that's just unrealistic. Except for stale references which I hope are a minority, each ID is present as a single generation in the World
at a given time. So a full u64
is overkill.
which in turn means that we couldn't implement the serde traits.
We already have the problem with deserialization though, where we need the type registry for example to correctly deserialize dynamic objects. So are we gaining much by restricting ourselves in the serialization side only?
The type registry is passed around in both serialization and deserialization, and the extra state represented by a mapping table could be too. Where it isn't passed is where bevy_reflect
delegates to serde implementations via reflect_value
. What you'd be losing is reflect_value
support for any component containing an entity reference. That seems to me to be a rather enormous loss. bevy_reflect
would also have to become explicitly aware of this mapping and introduce special handling for Entity
(which right now is itself handled via reflect_value
), representing a good bit of new complexity and some extra coupling (at the moment, bevy_reflect
doesn't even reference Entity
anywhere). If we go this route a thread-local static is probably the better solution, ugly as it would be. That or working with serde to try to upstream some kind of better stateful (de)serialization support. The upshot of this would be making MapEntities
unnecessary though.
We support several different serialized formats (ron, messagepack, postcard, bincode). This seems nightmarish.
I didn't mean patching the final blob, but rather the intermediate memory representation in serde's data model, which is independent of the format. But maybe that's not possible?
I'm not sure whether or not this is possible. I suspect it isn't because I don't think there will be enough type information to identify entity references.
Currently, we actually do. The components contained in that Vec<Box> just underneath pub entity: Entity in DynamicEntity can contain Entitys inside them, and they have to exactly match the top-level entity identifiers in the scene.
As far as I understand that's incorrect, the serialized representation of the
Entity
reference in the serialized component has to match the one of the serializedEntity
itself. But nothing forces that representation to be(id, generation)
. It could be any unique identifier, even a linearly allocated index. It's just that[id][generation]
is a convenient candidate because it can be derived from theEntity
without further context. A linearly allocated index or a GUID would require a stateful serialization.
Here I am talking about DynamicScene
and DynamicEntity
which are a pre-serialized representation. DynamicEntity
should hold an Entity
as its ID because the dyn Reflect
objects in its component list hold Entity
in their fields. Any mapping would happen after this point. Right now it's far downstream in MapEntities
but in some world where the mapping is handled in (de)serialization, DynamicScene
is going to be before that mapping in the serialization case and after the corresponding mapping in the deserialization case. Thus, it should hold Entity
.
If worlds can contain u32::MAX entities with u32::MAX generations on each of them, then you need just as many bits in the scene format if you want to be able to save every possible world.
But that's just unrealistic. Except for stale references which I hope are a minority, each ID is present as a single generation in the
World
at a given time. So a fullu64
is overkill.
A world could potentially contain u32::MAX stale references to different generations on the same index. I think that if you can represent it in a world you should be able to represent it in a scene. It's unrealistic to actually have u32::MAX entities, but is that an argument for using a u16 in the scene representation? The fact of the matter is that the ID space for entities consists of u64::MAX possible IDs. And really, I don't think anything outside bevy_ecs
should be looking at generation or index anyway, rendering Entity
a 64-bit wide identifier. The stale reference use case cannot be sidelined, and the ID space of references is 64 bits wide due to their ability to refer to dead entities, despite the id space of living entities being only 32 bits wide.
I've spent a few hours reading code and contemplating things and have some further thoughts.
which in turn means that we couldn't implement the serde traits.
We already have the problem with deserialization though, where we need the type registry for example to correctly deserialize dynamic objects. So are we gaining much by restricting ourselves in the serialization side only?
The type registry is passed around in both serialization and deserialization, and the extra state represented by a mapping table could be too. Where it isn't passed is where
bevy_reflect
delegates to serde implementations viareflect_value
. What you'd be losing isreflect_value
support for any component containing an entity reference. That seems to me to be a rather enormous loss.bevy_reflect
would also have to become explicitly aware of this mapping and introduce special handling forEntity
(which right now is itself handled viareflect_value
), representing a good bit of new complexity and some extra coupling (at the moment,bevy_reflect
doesn't even referenceEntity
anywhere). If we go this route a thread-local static is probably the better solution, ugly as it would be. That or working with serde to try to upstream some kind of better stateful (de)serialization support. The upshot of this would be makingMapEntities
unnecessary though.
The thread-local solution actually doesn't seem so bad, though it has some tradeoffs. One of those is a good deal more internal complexity, though it would be possible to remove the need for MapEntities
.
If worlds can contain u32::MAX entities with u32::MAX generations on each of them, then you need just as many bits in the scene format if you want to be able to save every possible world.
But that's just unrealistic. Except for stale references which I hope are a minority, each ID is present as a single generation in the
World
at a given time. So a fullu64
is overkill.A world could potentially contain u32::MAX stale references to different generations on the same index. I think that if you can represent it in a world you should be able to represent it in a scene. It's unrealistic to actually have u32::MAX entities, but is that an argument for using a u16 in the scene representation? The fact of the matter is that the ID space for entities consists of u64::MAX possible IDs. And really, I don't think anything outside
bevy_ecs
should be looking at generation or index anyway, renderingEntity
a 64-bit wide identifier. The stale reference use case cannot be sidelined, and the ID space of references is 64 bits wide due to their ability to refer to dead entities, despite the id space of living entities being only 32 bits wide.
I've realized that what I wrote here is false, or at least misleading. The approach in this PR allows for only u32::MAX - 1
dead references before the allocation method in EntityMapper
runs out of space, not the full 64 bits of id space Entity
represents. Second, because it's true that u32
is enough to represent the space of living entities, it should be enough for the entity
field on DynamicEntity
. More bits would be necessary only for component entity references. I do think that ability to match the ID space of a world is a plus factor for potential solutions.
The existing MapEntities
trait turns out to be enough to do the needed mapping if it's used between DynamicScene
construction and scene serialization, and can probably be done with small-ish changes to this PR.
Alternative: Thread local (de)serializer state
The thread local approach seems worth prototyping, and I have an idea for it that seems to have a decent balance of tradeoffs, one of the largest being that it adds a lot of complexity internal to bevy. The sketch of it is as follows:
- Scenes represent top-level entity IDs as u32
- Component entity references are represented as i64, where positive numbers are living entities and negative are dead
- i32 could also be used but this has further tradeoffs
-
Entities
gets a new method that gets an allocator struct, capable of allocating new living and dead entity IDs that can be later saved back to the originalEntities
struct so long as no other allocations have happened in-between
Serialization would look roughly like:
- Top-level Entity IDs are assigned to integer IDs ascending from 0
- They are then used to initialize an entity map, mapping from source world
Entity
IDs to i64 destination IDs - The map is saved to a thread local
RefCell
- Serialization happens
- Inside
Entity
'sSerialize
implementation, component entity refs get mapped to existing IDs in the map, and new IDs are allocated descending from -1 if no entry is present
- Inside
- The thread local is unset
Deserialization would look roughly like:
- An allocator is created from the destination world and saved to a thread local
RefCell
- Deserialization happens
- Inside
Entity
'sDeserialize
implementation, component entity refs get mapped to IDs in the destination world
- Inside
- The thread local is unset and the allocator saved back to the world
- Top-level entity ids are mapped using the mappings established in the allocator, and entities that are not present are spawned as new entities in the world
- Deserialized, entity mapped components are added to the all the living entities
This approach would have some nice upsides:
- u32 entity IDs
- No need for manual implementation of
MapEntities
- Retention of equality semantics between all saved component entity references
And some downsides I can think of:
- Complexity, though most of it is internal
- We lose one bit worth of ID space on references
- If entities are serialized without a mapping established, they must either fail to serialize or the mapping must be optional
- If the mapping is optional, they need to have a 64-bit wide output to accommodate the fallback behavior
- If one wants to reattach scene entities to existing world entities, it will be a decently more involved process due to needing to be aware of the ID mapping that results from serialization.
Were this to be done at all I think it would be followup material and out of scope for this PR, as it's more about eliminating MapEntities
than fixing behavior and builds on the solution in this current PR.
I think this PR should now be ready for push, and I might try a further PR that rids us of MapEntities
. The only other thing we may want to pull into this PR is remapping IDs on serialization as well as deserialization, with the main benefit of being able to map live entity IDs onto a 32-bit ID space. I think it makes sense to retain the new serializer implementations for Entity
, because if we want to keep index and generation out of the scene format, that's what's required. Otherwise, it'll show up in component entity references as the ugly (index: 0, generation: 0)
serialization.
I would like to add a usecase here of where I am trying to send a DynamicScene over a network to patch existing entities/spawn new ones and in this case dangling references could more easily happen just by virtue of latency. E.g. if one entity 4 is spawned and has components on it and is sent over, then that entity is despawned on the server but a new one is created with the same index, we now have a weird situation if that client didn't receive the request to despawn that entity.
Right now I'm working around this by essentially having a completely separate necessary component on all entities of ServerEntity(u64)
and a custom write to world function for dynamic scene.
I am trying to send a DynamicScene over a network to patch existing entities/spawn
A bit off-topic, but I would recommend you to create a new type for this. Because you will need to send deletions / insertions somehow. Just implement MapEntities
trait (a built-in trait by Bevy) for it.
I am trying to send a DynamicScene over a network to patch existing entities/spawn
A bit off-topic, but I would recommend you to create a new type for this. Because you will need to send deletions / insertions somehow. Just implement
MapEntities
trait (a built-in trait by Bevy) for it.
I suppose yeah, I'll just make my own map entities stuff for this.
Is this PR still live?
I've put out a PR in the same space, also dealing with MapEntities
(#7570). It basically makes sure that when a scene is loaded/reloaded, that MapEntities only gets applied to the components it loaded from the scene. It does this by adding a method to ReflectMapEntities
to only apply to certain entities, not all values in the EntityMap, since ReflectMapEntities is already a per-component struct. This is currently a problem with the fact that Parent
components are added to scene objects without a parent to point at the DynamicScene entity, and those parent components are being hit by the MapEntity behavior when a scene is reloaded.
But I'm wondering if you're doing things so that world entities generated from scenes are dependably unique, this might not be necessary?
Is this PR still live?
Yes, it just took me a bit to get back to it.
I've put out a PR in the same space, also dealing with
MapEntities
(#7570). It basically makes sure that when a scene is loaded/reloaded, that MapEntities only gets applied to the components it loaded from the scene. It does this by adding a method toReflectMapEntities
to only apply to certain entities, not all values in the EntityMap, since ReflectMapEntities is already a per-component struct. This is currently a problem with the fact thatParent
components are added to scene objects without a parent to point at the DynamicScene entity, and those parent components are being hit by the MapEntity behavior when a scene is reloaded.But I'm wondering if you're doing things so that world entities generated from scenes are dependably unique, this might not be necessary?
What this PR will do is make it so that, so long as something implements MapEntities
, all of its Entity
identifiers will reliably point to unique entities in the destination world, whether they point to dead or living entities. So, I think it addresses a different problem then your PR, as your PR has to do with applying over existing entities in the scene and this one addresses dealing with entity references inside scene components that point outside the scene.
(moved back to the 0.10 milestone, but it might be too late at this point so no promises!)
Moving this to 0.11 as it seems like we're going to be shipping 0.10 early today, and this isn't ready yet for a merge.
and this isn't ready yet for a merge.
But looks like comments were addressed. Or you mean that the approach is controversial? I just think scene loading can't be broken more. This PR fixes a pretty serious bug with scene loading.
Need to resolve conflicts and this seems ready for a final review/merge, right?
@Illiux I started using this new approach and found mutable world access inconvenient for the MapEntities
trait in practice. And it causes huge impact on performance if you map many components.
Mapping entities is useful for networking. Yes, I can create my own trait that just returns error on incorrect mapping (to avoid world acess), but this mean that I have to create two derives for component/events: one for networking and one for scene serialization.