bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Inline more ECS functions

Open james7132 opened this issue 2 years ago • 1 comments

Objective

Upon closer inspection, there are a few functions in the ECS that are not being inlined, even with the highest optimizations and LTO enabled:

  • Almost all WorldQuery::init_fetch calls. Affects Query::get calls in hot loops. In particular, the WorldQuery implementation for () is used everywhere as the default filter and is effectively a no-op.
  • Entities::get. Affects Query::get, World::get, and any component insertion or removal.
  • Entities::set. Affects any component insertion or removal.
  • Tick::new. I've only seen this in component insertion and spawning.
  • ArchetypeRow::new
  • BlobVec::set_len

Almost all of these have trivial or even empty implementations or have significant opportunity to be optimized into surrounding code when inlined with LTO enabled.

Solution

Inline them

james7132 avatar Mar 14 '23 09:03 james7132

Holding off on taking this out of draft until #8053 is merged since some of the optimizations using inlined Entities::get and Entities::set relies on the unwrap when handling bundles to be removed in release builds.

james7132 avatar Mar 14 '23 09:03 james7132

I just kicked off a merge of #8053

cart avatar Mar 21 '23 20:03 cart

Is there an impact on build time with those new inlines?

mockersf avatar Mar 28 '23 21:03 mockersf

Is there an impact on build time with those new inlines?

Good question.Definitely worth measuring. I would assume it would increase the amount of generated code whenever larger functions like Entities::get and SparseSets::get are used, but it probably doesn't have that strong of an impact without LTO enabled.

james7132 avatar Mar 28 '23 21:03 james7132

@mockersf I checked the compiler output changes for this PR: https://github.com/james7132/bevy_asm_tests/commit/214de16807d1d33eeb8f7434fdc7e0f766ae1053

Seems like there's a slight increase in codegen (~100-170 instructions) for component insertion/removal, and a signifcant decrease for any fetch or iteration (i.e. Query iteration, Query::get, and World::get).

james7132 avatar Apr 11 '23 19:04 james7132

As a final sanity check, I reran microbenchmarks for this PR. Gains are generally pretty small or within error margin, with the exception of Query::get and World::get for sparse components.

group                                           main                                    more-inline
-----                                           ----                                    -----------
add_remove/sparse_set                           1.02   770.1±49.09µs        ? ?/sec     1.00   752.8±43.03µs        ? ?/sec
add_remove/table                                1.00  1146.2±35.93µs        ? ?/sec     1.00  1148.6±36.77µs        ? ?/sec
add_remove_big/sparse_set                       1.00  822.2±131.87µs        ? ?/sec     1.01  828.3±155.04µs        ? ?/sec
add_remove_big/table                            1.01      2.4±0.11ms        ? ?/sec     1.00      2.4±0.45ms        ? ?/sec
get_or_spawn/batched                            1.01   307.0±17.62µs        ? ?/sec     1.00   303.8±13.68µs        ? ?/sec
get_or_spawn/individual                         1.01   488.8±56.75µs        ? ?/sec     1.00   483.0±55.45µs        ? ?/sec
heavy_compute/base                              1.02   219.6±26.82µs        ? ?/sec     1.00    214.5±2.24µs        ? ?/sec
insert_commands/insert                          1.00   373.1±28.46µs        ? ?/sec     1.01   376.1±34.78µs        ? ?/sec
insert_commands/insert_batch                    1.01   301.4±22.15µs        ? ?/sec     1.00   299.6±18.21µs        ? ?/sec
insert_simple/base                              1.00    362.8±6.63µs        ? ?/sec     1.00    361.3±6.05µs        ? ?/sec
insert_simple/unbatched                         1.00   754.7±14.42µs        ? ?/sec     1.00   751.0±40.23µs        ? ?/sec
iter_fragmented/base                            1.00   346.2±10.32ns        ? ?/sec     1.00    344.7±7.17ns        ? ?/sec
iter_fragmented/foreach                         1.00   161.6±21.58ns        ? ?/sec     1.07   173.4±30.21ns        ? ?/sec
iter_fragmented/foreach_wide                    1.02      3.8±0.09µs        ? ?/sec     1.00      3.7±0.07µs        ? ?/sec
iter_fragmented/wide                            1.07      4.1±0.26µs        ? ?/sec     1.00      3.8±0.10µs        ? ?/sec
iter_fragmented_sparse/base                     1.00      7.6±0.20ns        ? ?/sec     1.05      7.9±0.37ns        ? ?/sec
iter_fragmented_sparse/foreach                  1.01      7.8±0.31ns        ? ?/sec     1.00      7.8±0.26ns        ? ?/sec
iter_fragmented_sparse/foreach_wide             1.00     40.1±1.55ns        ? ?/sec     1.01     40.6±2.27ns        ? ?/sec
iter_fragmented_sparse/wide                     1.00     42.1±1.40ns        ? ?/sec     1.00     42.3±1.50ns        ? ?/sec
iter_simple/base                                1.07      8.9±0.23µs        ? ?/sec     1.00      8.3±0.08µs        ? ?/sec
iter_simple/foreach                             1.00      8.4±0.28µs        ? ?/sec     1.00      8.4±0.17µs        ? ?/sec
iter_simple/foreach_sparse_set                  1.03     26.9±0.53µs        ? ?/sec     1.00     26.1±0.58µs        ? ?/sec
iter_simple/foreach_wide                        1.00     41.8±0.87µs        ? ?/sec     1.03     43.2±0.24µs        ? ?/sec
iter_simple/foreach_wide_sparse_set             1.00    114.9±1.70µs        ? ?/sec     1.04    119.8±3.59µs        ? ?/sec
iter_simple/sparse_set                          1.00     29.3±0.77µs        ? ?/sec     1.00     29.3±0.55µs        ? ?/sec
iter_simple/system                              1.00      8.5±0.17µs        ? ?/sec     1.00      8.5±0.29µs        ? ?/sec
iter_simple/wide                                1.00     39.4±0.74µs        ? ?/sec     1.00     39.3±0.35µs        ? ?/sec
iter_simple/wide_sparse_set                     1.02    129.2±6.91µs        ? ?/sec     1.00    127.1±4.60µs        ? ?/sec
query_get/50000_entities_sparse                 1.00    308.6±5.09µs        ? ?/sec     1.00    308.3±1.44µs        ? ?/sec
query_get/50000_entities_table                  1.00    266.0±1.41µs        ? ?/sec     1.00    266.0±2.35µs        ? ?/sec
query_get_component/50000_entities_sparse       1.03   742.1±39.70µs        ? ?/sec     1.00   722.9±24.46µs        ? ?/sec
query_get_component/50000_entities_table        1.00   757.5±13.13µs        ? ?/sec     1.00   757.3±32.86µs        ? ?/sec
query_get_component_simple/system               1.00    561.9±7.47µs        ? ?/sec     1.01   565.6±10.25µs        ? ?/sec
query_get_component_simple/unchecked            1.00    716.6±8.94µs        ? ?/sec     1.00    717.4±6.52µs        ? ?/sec
query_get_many_10/50000_calls_sparse            1.13      4.9±0.64ms        ? ?/sec     1.00      4.3±0.75ms        ? ?/sec
query_get_many_10/50000_calls_table             1.13      4.5±0.63ms        ? ?/sec     1.00      3.9±0.16ms        ? ?/sec
query_get_many_2/50000_calls_sparse             1.06   708.4±63.05µs        ? ?/sec     1.00  668.9±145.37µs        ? ?/sec
query_get_many_2/50000_calls_table              1.04   719.4±81.49µs        ? ?/sec     1.00   692.2±34.30µs        ? ?/sec
query_get_many_5/50000_calls_sparse             1.18      2.1±0.35ms        ? ?/sec     1.00  1755.6±141.57µs        ? ?/sec
query_get_many_5/50000_calls_table              1.05  1906.7±231.01µs        ? ?/sec    1.00  1813.9±96.22µs        ? ?/sec
spawn_commands/2000_entities                    1.03   182.8±13.81µs        ? ?/sec     1.00    177.0±6.97µs        ? ?/sec
spawn_commands/4000_entities                    1.03   367.3±25.23µs        ? ?/sec     1.00   356.4±12.60µs        ? ?/sec
spawn_commands/6000_entities                    1.00   520.8±28.99µs        ? ?/sec     1.03   535.2±24.06µs        ? ?/sec
spawn_commands/8000_entities                    1.01   742.7±41.84µs        ? ?/sec     1.00   734.1±34.18µs        ? ?/sec
spawn_world/10000_entities                      1.03  898.1±118.55µs        ? ?/sec     1.00   873.4±86.67µs        ? ?/sec
spawn_world/1000_entities                       1.08    93.2±11.56µs        ? ?/sec     1.00     86.4±8.39µs        ? ?/sec
spawn_world/100_entities                        1.09      9.6±1.43µs        ? ?/sec     1.00      8.8±0.89µs        ? ?/sec
spawn_world/10_entities                         1.00  895.9±157.15ns        ? ?/sec     1.01   908.4±87.62ns        ? ?/sec
spawn_world/1_entities                          1.04    93.4±14.79ns        ? ?/sec     1.00    89.7±12.34ns        ? ?/sec
world_entity/50000_entities                     1.05   104.9±12.09µs        ? ?/sec     1.00    100.1±0.19µs        ? ?/sec
world_get/50000_entities_sparse                 1.10   227.0±32.64µs        ? ?/sec     1.00    205.7±1.00µs        ? ?/sec
world_get/50000_entities_table                  1.01    172.8±3.20µs        ? ?/sec     1.00    171.4±1.96µs        ? ?/sec
world_query_for_each/50000_entities_sparse      1.00     53.6±0.83µs        ? ?/sec     1.00     53.6±0.17µs        ? ?/sec
world_query_for_each/50000_entities_table       1.00     27.2±0.22µs        ? ?/sec     1.00     27.2±0.16µs        ? ?/sec
world_query_get/50000_entities_sparse           1.04    100.9±8.07µs        ? ?/sec     1.00     96.8±0.44µs        ? ?/sec
world_query_get/50000_entities_sparse_wide      1.00    194.5±2.70µs        ? ?/sec     1.00    195.0±0.42µs        ? ?/sec
world_query_get/50000_entities_table            1.00    126.6±7.07µs        ? ?/sec     1.00    126.2±1.19µs        ? ?/sec
world_query_get/50000_entities_table_wide       1.03    235.5±3.84µs        ? ?/sec     1.00    229.4±2.03µs        ? ?/sec
world_query_iter/50000_entities_sparse          1.00     54.0±0.47µs        ? ?/sec     1.00     53.8±0.14µs        ? ?/sec
world_query_iter/50000_entities_table           1.00     27.2±0.18µs        ? ?/sec     1.00     27.2±0.08µs        ? ?/sec

james7132 avatar Apr 11 '23 23:04 james7132

Did two builds just to check for major build time regressions.

This PR: 1m 14s Base branch of this PR: 1m 16s

No significant changes. Maybe slightly faster but probably just within the noise.

cart avatar Apr 12 '23 19:04 cart