bevy
bevy copied to clipboard
Inline more ECS functions
Objective
Upon closer inspection, there are a few functions in the ECS that are not being inlined, even with the highest optimizations and LTO enabled:
- Almost all WorldQuery::init_fetch calls. Affects
Query::getcalls in hot loops. In particular, theWorldQueryimplementation for()is used everywhere as the default filter and is effectively a no-op. - Entities::get. Affects
Query::get,World::get, and any component insertion or removal. - Entities::set. Affects any component insertion or removal.
- Tick::new. I've only seen this in component insertion and spawning.
- ArchetypeRow::new
- BlobVec::set_len
Almost all of these have trivial or even empty implementations or have significant opportunity to be optimized into surrounding code when inlined with LTO enabled.
Solution
Inline them
Holding off on taking this out of draft until #8053 is merged since some of the optimizations using inlined Entities::get and Entities::set relies on the unwrap when handling bundles to be removed in release builds.
Found a few more:
I just kicked off a merge of #8053
Is there an impact on build time with those new inlines?
Is there an impact on build time with those new inlines?
Good question.Definitely worth measuring. I would assume it would increase the amount of generated code whenever larger functions like Entities::get and SparseSets::get are used, but it probably doesn't have that strong of an impact without LTO enabled.
@mockersf I checked the compiler output changes for this PR: https://github.com/james7132/bevy_asm_tests/commit/214de16807d1d33eeb8f7434fdc7e0f766ae1053
Seems like there's a slight increase in codegen (~100-170 instructions) for component insertion/removal, and a signifcant decrease for any fetch or iteration (i.e. Query iteration, Query::get, and World::get).
As a final sanity check, I reran microbenchmarks for this PR. Gains are generally pretty small or within error margin, with the exception of Query::get and World::get for sparse components.
group main more-inline
----- ---- -----------
add_remove/sparse_set 1.02 770.1±49.09µs ? ?/sec 1.00 752.8±43.03µs ? ?/sec
add_remove/table 1.00 1146.2±35.93µs ? ?/sec 1.00 1148.6±36.77µs ? ?/sec
add_remove_big/sparse_set 1.00 822.2±131.87µs ? ?/sec 1.01 828.3±155.04µs ? ?/sec
add_remove_big/table 1.01 2.4±0.11ms ? ?/sec 1.00 2.4±0.45ms ? ?/sec
get_or_spawn/batched 1.01 307.0±17.62µs ? ?/sec 1.00 303.8±13.68µs ? ?/sec
get_or_spawn/individual 1.01 488.8±56.75µs ? ?/sec 1.00 483.0±55.45µs ? ?/sec
heavy_compute/base 1.02 219.6±26.82µs ? ?/sec 1.00 214.5±2.24µs ? ?/sec
insert_commands/insert 1.00 373.1±28.46µs ? ?/sec 1.01 376.1±34.78µs ? ?/sec
insert_commands/insert_batch 1.01 301.4±22.15µs ? ?/sec 1.00 299.6±18.21µs ? ?/sec
insert_simple/base 1.00 362.8±6.63µs ? ?/sec 1.00 361.3±6.05µs ? ?/sec
insert_simple/unbatched 1.00 754.7±14.42µs ? ?/sec 1.00 751.0±40.23µs ? ?/sec
iter_fragmented/base 1.00 346.2±10.32ns ? ?/sec 1.00 344.7±7.17ns ? ?/sec
iter_fragmented/foreach 1.00 161.6±21.58ns ? ?/sec 1.07 173.4±30.21ns ? ?/sec
iter_fragmented/foreach_wide 1.02 3.8±0.09µs ? ?/sec 1.00 3.7±0.07µs ? ?/sec
iter_fragmented/wide 1.07 4.1±0.26µs ? ?/sec 1.00 3.8±0.10µs ? ?/sec
iter_fragmented_sparse/base 1.00 7.6±0.20ns ? ?/sec 1.05 7.9±0.37ns ? ?/sec
iter_fragmented_sparse/foreach 1.01 7.8±0.31ns ? ?/sec 1.00 7.8±0.26ns ? ?/sec
iter_fragmented_sparse/foreach_wide 1.00 40.1±1.55ns ? ?/sec 1.01 40.6±2.27ns ? ?/sec
iter_fragmented_sparse/wide 1.00 42.1±1.40ns ? ?/sec 1.00 42.3±1.50ns ? ?/sec
iter_simple/base 1.07 8.9±0.23µs ? ?/sec 1.00 8.3±0.08µs ? ?/sec
iter_simple/foreach 1.00 8.4±0.28µs ? ?/sec 1.00 8.4±0.17µs ? ?/sec
iter_simple/foreach_sparse_set 1.03 26.9±0.53µs ? ?/sec 1.00 26.1±0.58µs ? ?/sec
iter_simple/foreach_wide 1.00 41.8±0.87µs ? ?/sec 1.03 43.2±0.24µs ? ?/sec
iter_simple/foreach_wide_sparse_set 1.00 114.9±1.70µs ? ?/sec 1.04 119.8±3.59µs ? ?/sec
iter_simple/sparse_set 1.00 29.3±0.77µs ? ?/sec 1.00 29.3±0.55µs ? ?/sec
iter_simple/system 1.00 8.5±0.17µs ? ?/sec 1.00 8.5±0.29µs ? ?/sec
iter_simple/wide 1.00 39.4±0.74µs ? ?/sec 1.00 39.3±0.35µs ? ?/sec
iter_simple/wide_sparse_set 1.02 129.2±6.91µs ? ?/sec 1.00 127.1±4.60µs ? ?/sec
query_get/50000_entities_sparse 1.00 308.6±5.09µs ? ?/sec 1.00 308.3±1.44µs ? ?/sec
query_get/50000_entities_table 1.00 266.0±1.41µs ? ?/sec 1.00 266.0±2.35µs ? ?/sec
query_get_component/50000_entities_sparse 1.03 742.1±39.70µs ? ?/sec 1.00 722.9±24.46µs ? ?/sec
query_get_component/50000_entities_table 1.00 757.5±13.13µs ? ?/sec 1.00 757.3±32.86µs ? ?/sec
query_get_component_simple/system 1.00 561.9±7.47µs ? ?/sec 1.01 565.6±10.25µs ? ?/sec
query_get_component_simple/unchecked 1.00 716.6±8.94µs ? ?/sec 1.00 717.4±6.52µs ? ?/sec
query_get_many_10/50000_calls_sparse 1.13 4.9±0.64ms ? ?/sec 1.00 4.3±0.75ms ? ?/sec
query_get_many_10/50000_calls_table 1.13 4.5±0.63ms ? ?/sec 1.00 3.9±0.16ms ? ?/sec
query_get_many_2/50000_calls_sparse 1.06 708.4±63.05µs ? ?/sec 1.00 668.9±145.37µs ? ?/sec
query_get_many_2/50000_calls_table 1.04 719.4±81.49µs ? ?/sec 1.00 692.2±34.30µs ? ?/sec
query_get_many_5/50000_calls_sparse 1.18 2.1±0.35ms ? ?/sec 1.00 1755.6±141.57µs ? ?/sec
query_get_many_5/50000_calls_table 1.05 1906.7±231.01µs ? ?/sec 1.00 1813.9±96.22µs ? ?/sec
spawn_commands/2000_entities 1.03 182.8±13.81µs ? ?/sec 1.00 177.0±6.97µs ? ?/sec
spawn_commands/4000_entities 1.03 367.3±25.23µs ? ?/sec 1.00 356.4±12.60µs ? ?/sec
spawn_commands/6000_entities 1.00 520.8±28.99µs ? ?/sec 1.03 535.2±24.06µs ? ?/sec
spawn_commands/8000_entities 1.01 742.7±41.84µs ? ?/sec 1.00 734.1±34.18µs ? ?/sec
spawn_world/10000_entities 1.03 898.1±118.55µs ? ?/sec 1.00 873.4±86.67µs ? ?/sec
spawn_world/1000_entities 1.08 93.2±11.56µs ? ?/sec 1.00 86.4±8.39µs ? ?/sec
spawn_world/100_entities 1.09 9.6±1.43µs ? ?/sec 1.00 8.8±0.89µs ? ?/sec
spawn_world/10_entities 1.00 895.9±157.15ns ? ?/sec 1.01 908.4±87.62ns ? ?/sec
spawn_world/1_entities 1.04 93.4±14.79ns ? ?/sec 1.00 89.7±12.34ns ? ?/sec
world_entity/50000_entities 1.05 104.9±12.09µs ? ?/sec 1.00 100.1±0.19µs ? ?/sec
world_get/50000_entities_sparse 1.10 227.0±32.64µs ? ?/sec 1.00 205.7±1.00µs ? ?/sec
world_get/50000_entities_table 1.01 172.8±3.20µs ? ?/sec 1.00 171.4±1.96µs ? ?/sec
world_query_for_each/50000_entities_sparse 1.00 53.6±0.83µs ? ?/sec 1.00 53.6±0.17µs ? ?/sec
world_query_for_each/50000_entities_table 1.00 27.2±0.22µs ? ?/sec 1.00 27.2±0.16µs ? ?/sec
world_query_get/50000_entities_sparse 1.04 100.9±8.07µs ? ?/sec 1.00 96.8±0.44µs ? ?/sec
world_query_get/50000_entities_sparse_wide 1.00 194.5±2.70µs ? ?/sec 1.00 195.0±0.42µs ? ?/sec
world_query_get/50000_entities_table 1.00 126.6±7.07µs ? ?/sec 1.00 126.2±1.19µs ? ?/sec
world_query_get/50000_entities_table_wide 1.03 235.5±3.84µs ? ?/sec 1.00 229.4±2.03µs ? ?/sec
world_query_iter/50000_entities_sparse 1.00 54.0±0.47µs ? ?/sec 1.00 53.8±0.14µs ? ?/sec
world_query_iter/50000_entities_table 1.00 27.2±0.18µs ? ?/sec 1.00 27.2±0.08µs ? ?/sec
Did two builds just to check for major build time regressions.
This PR: 1m 14s Base branch of this PR: 1m 16s
No significant changes. Maybe slightly faster but probably just within the noise.