bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Override QueryIter::fold to port Query::for_each perf gains to select Iterator combinators

Open james7132 opened this issue 2 years ago • 5 comments

Objective

After #6547, Query::for_each has been capable of automatic vectorization on certain queries, which is seeing a notable (>50% CPU time improvements) for iteration. However, Query::for_each isn't idiomatic Rust, and lacks the flexibility of iterator combinators.

Ideally, Query::iter and friends should be able to achieve the same results. However, this does seem to blocked upstream (rust-lang/rust#104914) by Rust's loop optimizations.

Solution

This is an intermediate solution and refactor. This moves the Query::for_each implementation onto the Iterator::fold implementation for QueryIter instead. This should result in the same automatic vectorization optimization on all Iterator functions that internally use fold, including Iterator::for_each, Iterator::count, etc.

With this, it should close the gap between the two completely. Internally, this PR changes Query::for_each to use query.iter().for_each(..) instead of the duplicated implementation.

Separately, the duplicate implementations of internal iteration (i.e. Query::par_for_each) now use portions of the current Query::for_each implementation factored out into their own functions.

This also massively cleans up our internal fragmentation of internal iteration options, deduplicating the iteration code used in for_each and par_iter().for_each().


Changelog

Changed: Query::for_each, Query::for_each_mut, Query::for_each, and Query::for_each_mut have been moved to QueryIter's Iterator::for_each implementation, and still retains their performance improvements over normal iteration. These APIs are deprecated in 0.10 and will be removed in 0.11.

james7132 avatar Nov 27 '22 09:11 james7132

Oh, man, this hadn't happened yet? It came up on Discord back in Feb 2021; I guess I should have opened an issue.

As an FYI, there's a chance that implementing try_fold will happen even before all the traits are stabilized: https://github.com/rust-lang/rust/issues/84277#issuecomment-1197104361. If you end up needing that, please post about it to the tracking issue.

scottmcm avatar Nov 27 '22 11:11 scottmcm

Finally was able to address the performance issues, so I'm taking this out of draft. Benchmark results:

group                                           main                                    query-iter-fold
-----                                           ----                                    ---------------
add_remove/sparse_set                           1.00   783.2±35.54µs        ? ?/sec     1.03   808.5±73.03µs        ? ?/sec
add_remove/table                                1.00   1141.2±8.12µs        ? ?/sec     1.03  1178.1±35.55µs        ? ?/sec
add_remove_big/sparse_set                       1.00  901.6±213.54µs        ? ?/sec     1.04  936.2±248.48µs        ? ?/sec
add_remove_big/table                            1.00      2.4±0.01ms        ? ?/sec     1.01      2.4±0.06ms        ? ?/sec
added_archetypes/archetype_count/100            1.15   261.0±11.59µs        ? ?/sec     1.00   226.4±10.16µs        ? ?/sec
added_archetypes/archetype_count/1000           1.30   901.8±57.76µs        ? ?/sec     1.00   695.1±40.32µs        ? ?/sec
added_archetypes/archetype_count/10000          1.00      9.5±0.53ms        ? ?/sec     1.02      9.7±0.77ms        ? ?/sec
added_archetypes/archetype_count/200            1.15   330.8±15.14µs        ? ?/sec     1.00   286.8±18.38µs        ? ?/sec
added_archetypes/archetype_count/2000           1.20  1601.8±181.73µs        ? ?/sec    1.00  1332.3±105.68µs        ? ?/sec
added_archetypes/archetype_count/500            1.29   581.2±35.53µs        ? ?/sec     1.00   452.2±33.39µs        ? ?/sec
added_archetypes/archetype_count/5000           1.00      3.5±0.20ms        ? ?/sec     1.00      3.5±0.24ms        ? ?/sec
build_schedule/1000_schedule                    1.00       3.4±0.07s        ? ?/sec     1.01       3.4±0.09s        ? ?/sec
build_schedule/1000_schedule_noconstraints      1.00    154.3±3.29ms        ? ?/sec     1.00    154.6±3.22ms        ? ?/sec
build_schedule/100_schedule                     1.02     19.4±0.08ms        ? ?/sec     1.00     18.9±0.13ms        ? ?/sec
build_schedule/100_schedule_noconstraints       1.19      2.1±0.02ms        ? ?/sec     1.00  1790.3±31.90µs        ? ?/sec
build_schedule/500_schedule                     1.00    622.0±5.47ms        ? ?/sec     1.06   661.9±12.40ms        ? ?/sec
build_schedule/500_schedule_noconstraints       1.03     37.9±0.40ms        ? ?/sec     1.00     36.9±0.73ms        ? ?/sec
busy_systems/01x_entities_03_systems            1.25     40.1±1.40µs        ? ?/sec     1.00     32.0±1.42µs        ? ?/sec
busy_systems/01x_entities_06_systems            1.24     72.5±1.57µs        ? ?/sec     1.00     58.3±2.88µs        ? ?/sec
busy_systems/01x_entities_09_systems            1.16     92.3±1.62µs        ? ?/sec     1.00     79.7±1.78µs        ? ?/sec
busy_systems/01x_entities_12_systems            1.14    114.5±2.03µs        ? ?/sec     1.00    100.3±3.22µs        ? ?/sec
busy_systems/01x_entities_15_systems            1.20    138.7±2.29µs        ? ?/sec     1.00    115.1±3.32µs        ? ?/sec
busy_systems/02x_entities_03_systems            1.18     57.3±1.76µs        ? ?/sec     1.00     48.4±2.55µs        ? ?/sec
busy_systems/02x_entities_06_systems            1.24    106.2±2.51µs        ? ?/sec     1.00     85.5±3.02µs        ? ?/sec
busy_systems/02x_entities_09_systems            1.18    146.6±4.28µs        ? ?/sec     1.00    124.7±4.55µs        ? ?/sec
busy_systems/02x_entities_12_systems            1.17    183.7±4.88µs        ? ?/sec     1.00    156.9±4.99µs        ? ?/sec
busy_systems/02x_entities_15_systems            1.16    222.8±6.38µs        ? ?/sec     1.00    191.8±4.95µs        ? ?/sec
busy_systems/03x_entities_03_systems            1.18     74.6±3.76µs        ? ?/sec     1.00     63.4±2.35µs        ? ?/sec
busy_systems/03x_entities_06_systems            1.40    153.2±4.98µs        ? ?/sec     1.00    109.6±3.39µs        ? ?/sec
busy_systems/03x_entities_09_systems            1.17    196.4±7.89µs        ? ?/sec     1.00    167.9±4.04µs        ? ?/sec
busy_systems/03x_entities_12_systems            1.18    255.1±6.12µs        ? ?/sec     1.00    216.4±6.41µs        ? ?/sec
busy_systems/03x_entities_15_systems            1.16    312.1±9.17µs        ? ?/sec     1.00    269.3±8.96µs        ? ?/sec
busy_systems/04x_entities_03_systems            1.17     95.6±4.21µs        ? ?/sec     1.00     81.7±2.76µs        ? ?/sec
busy_systems/04x_entities_06_systems            1.10    154.3±7.39µs        ? ?/sec     1.00    140.5±8.90µs        ? ?/sec
busy_systems/04x_entities_09_systems            1.20    249.7±7.97µs        ? ?/sec     1.00    207.5±7.37µs        ? ?/sec
busy_systems/04x_entities_12_systems            1.23   339.1±12.19µs        ? ?/sec     1.00    275.6±7.88µs        ? ?/sec
busy_systems/04x_entities_15_systems            1.25   425.0±14.00µs        ? ?/sec     1.00   339.3±13.37µs        ? ?/sec
busy_systems/05x_entities_03_systems            1.22    116.1±5.73µs        ? ?/sec     1.00     95.4±4.06µs        ? ?/sec
busy_systems/05x_entities_06_systems            1.21    206.6±9.36µs        ? ?/sec     1.00    170.9±4.78µs        ? ?/sec
busy_systems/05x_entities_09_systems            1.25   308.7±18.26µs        ? ?/sec     1.00    246.2±5.96µs        ? ?/sec
busy_systems/05x_entities_12_systems            1.32   430.0±19.66µs        ? ?/sec     1.00    324.7±9.28µs        ? ?/sec
busy_systems/05x_entities_15_systems            1.24   501.0±11.96µs        ? ?/sec     1.00    403.3±9.83µs        ? ?/sec
contrived/01x_entities_03_systems               1.27     36.2±1.16µs        ? ?/sec     1.00     28.4±1.42µs        ? ?/sec
contrived/01x_entities_06_systems               1.28     58.0±1.10µs        ? ?/sec     1.00     45.3±1.36µs        ? ?/sec
contrived/01x_entities_09_systems               1.30     77.7±1.47µs        ? ?/sec     1.00     59.6±2.95µs        ? ?/sec
contrived/01x_entities_12_systems               1.30     97.7±1.31µs        ? ?/sec     1.00     75.4±2.91µs        ? ?/sec
contrived/01x_entities_15_systems               1.27    114.3±8.74µs        ? ?/sec     1.00     90.0±4.64µs        ? ?/sec
contrived/02x_entities_03_systems               1.19     45.1±1.10µs        ? ?/sec     1.00     37.8±1.92µs        ? ?/sec
contrived/02x_entities_06_systems               1.21     77.2±1.62µs        ? ?/sec     1.00     63.9±2.74µs        ? ?/sec
contrived/02x_entities_09_systems               1.19    103.4±1.85µs        ? ?/sec     1.00     86.6±3.58µs        ? ?/sec
contrived/02x_entities_12_systems               1.22    134.5±2.26µs        ? ?/sec     1.00    110.0±4.38µs        ? ?/sec
contrived/02x_entities_15_systems               1.24    163.1±2.54µs        ? ?/sec     1.00    131.1±4.89µs        ? ?/sec
contrived/03x_entities_03_systems               1.19     54.5±1.74µs        ? ?/sec     1.00     45.7±1.91µs        ? ?/sec
contrived/03x_entities_06_systems               1.36     96.6±3.39µs        ? ?/sec     1.00     71.2±2.63µs        ? ?/sec
contrived/03x_entities_09_systems               1.31    129.8±4.89µs        ? ?/sec     1.00     99.1±2.95µs        ? ?/sec
contrived/03x_entities_12_systems               1.32    164.7±2.86µs        ? ?/sec     1.00    124.7±4.15µs        ? ?/sec
contrived/03x_entities_15_systems               1.33    200.1±4.71µs        ? ?/sec     1.00    150.7±4.68µs        ? ?/sec
contrived/04x_entities_03_systems               1.20     63.3±1.55µs        ? ?/sec     1.00     52.7±2.79µs        ? ?/sec
contrived/04x_entities_06_systems               1.34    112.3±2.68µs        ? ?/sec     1.00     83.6±2.81µs        ? ?/sec
contrived/04x_entities_09_systems               1.29    151.3±2.71µs        ? ?/sec     1.00    117.4±3.46µs        ? ?/sec
contrived/04x_entities_12_systems               1.32    194.1±4.25µs        ? ?/sec     1.00    147.3±4.79µs        ? ?/sec
contrived/04x_entities_15_systems               1.33    239.3±7.97µs        ? ?/sec     1.00    179.4±4.79µs        ? ?/sec
contrived/05x_entities_03_systems               1.19     69.3±1.63µs        ? ?/sec     1.00     58.2±1.63µs        ? ?/sec
contrived/05x_entities_06_systems               1.27    121.1±2.50µs        ? ?/sec     1.00     95.1±4.32µs        ? ?/sec
contrived/05x_entities_09_systems               1.27    166.2±3.55µs        ? ?/sec     1.00    131.0±4.68µs        ? ?/sec
contrived/05x_entities_12_systems               1.27    218.8±4.35µs        ? ?/sec     1.00    172.7±7.30µs        ? ?/sec
contrived/05x_entities_15_systems               1.27    268.7±8.17µs        ? ?/sec     1.00    211.0±8.86µs        ? ?/sec
empty_commands/0_entities                       1.07      4.4±0.02ns        ? ?/sec     1.00      4.1±0.02ns        ? ?/sec
empty_systems/000_systems                       1.00      6.1±0.15ns        ? ?/sec     1.00      6.1±0.09ns        ? ?/sec
empty_systems/001_systems                       1.85     17.4±1.10µs        ? ?/sec     1.00      9.4±2.41µs        ? ?/sec
empty_systems/002_systems                       1.26     18.7±1.14µs        ? ?/sec     1.00     14.8±1.74µs        ? ?/sec
empty_systems/003_systems                       1.24     19.8±1.07µs        ? ?/sec     1.00     16.0±1.02µs        ? ?/sec
empty_systems/004_systems                       1.49     19.9±1.29µs        ? ?/sec     1.00     13.4±0.77µs        ? ?/sec
empty_systems/005_systems                       1.36     19.6±1.59µs        ? ?/sec     1.00     14.4±0.97µs        ? ?/sec
empty_systems/010_systems                       1.28     27.5±0.75µs        ? ?/sec     1.00     21.5±1.44µs        ? ?/sec
empty_systems/015_systems                       1.44     36.8±0.53µs        ? ?/sec     1.00     25.6±2.57µs        ? ?/sec
empty_systems/020_systems                       1.32     41.1±1.60µs        ? ?/sec     1.00     31.1±3.33µs        ? ?/sec
empty_systems/025_systems                       1.27     47.2±1.01µs        ? ?/sec     1.00     37.3±2.43µs        ? ?/sec
empty_systems/030_systems                       1.35     53.5±1.18µs        ? ?/sec     1.00     39.7±2.71µs        ? ?/sec
empty_systems/035_systems                       1.39     59.7±1.32µs        ? ?/sec     1.00     42.8±2.83µs        ? ?/sec
empty_systems/040_systems                       1.39     68.1±2.71µs        ? ?/sec     1.00     48.9±3.06µs        ? ?/sec
empty_systems/045_systems                       1.31     75.9±1.49µs        ? ?/sec     1.00     57.8±2.94µs        ? ?/sec
empty_systems/050_systems                       1.38     84.6±2.02µs        ? ?/sec     1.00     61.2±4.58µs        ? ?/sec
empty_systems/055_systems                       1.36     93.4±3.12µs        ? ?/sec     1.00     68.5±4.42µs        ? ?/sec
empty_systems/060_systems                       1.38    100.9±2.31µs        ? ?/sec     1.00     73.3±4.36µs        ? ?/sec
empty_systems/065_systems                       1.31    109.3±3.51µs        ? ?/sec     1.00     83.3±3.58µs        ? ?/sec
empty_systems/070_systems                       1.31    116.9±2.85µs        ? ?/sec     1.00     89.4±3.10µs        ? ?/sec
empty_systems/075_systems                       1.32    125.4±2.59µs        ? ?/sec     1.00     94.8±3.58µs        ? ?/sec
empty_systems/080_systems                       1.30    133.2±3.27µs        ? ?/sec     1.00    102.2±2.76µs        ? ?/sec
empty_systems/085_systems                       1.31    142.0±2.28µs        ? ?/sec     1.00    108.4±3.65µs        ? ?/sec
empty_systems/090_systems                       1.26    149.2±2.61µs        ? ?/sec     1.00    118.1±3.51µs        ? ?/sec
empty_systems/095_systems                       1.24    157.1±3.51µs        ? ?/sec     1.00    126.4±3.88µs        ? ?/sec
empty_systems/100_systems                       1.23    163.3±2.72µs        ? ?/sec     1.00    133.3±3.63µs        ? ?/sec
fake_commands/2000_commands                     1.00      6.8±0.02µs        ? ?/sec     1.04      7.1±0.08µs        ? ?/sec
fake_commands/4000_commands                     1.00     13.7±0.05µs        ? ?/sec     1.04     14.3±0.03µs        ? ?/sec
fake_commands/6000_commands                     1.00     20.7±0.10µs        ? ?/sec     1.04     21.4±0.05µs        ? ?/sec
fake_commands/8000_commands                     1.00     27.6±0.10µs        ? ?/sec     1.04     28.6±0.07µs        ? ?/sec
get_or_spawn/batched                            1.02   365.4±11.94µs        ? ?/sec     1.00   359.2±11.34µs        ? ?/sec
get_or_spawn/individual                         1.02   544.0±40.05µs        ? ?/sec     1.00   535.7±41.96µs        ? ?/sec
heavy_compute/base                              1.02    214.4±1.33µs        ? ?/sec     1.00    211.0±1.88µs        ? ?/sec
insert_commands/insert                          1.00   452.4±38.52µs        ? ?/sec     1.00   452.8±38.28µs        ? ?/sec
insert_commands/insert_batch                    1.00   364.1±14.96µs        ? ?/sec     1.00   365.2±13.51µs        ? ?/sec
insert_simple/base                              1.01    431.7±3.70µs        ? ?/sec     1.00    425.7±2.33µs        ? ?/sec
insert_simple/unbatched                         1.00   728.8±12.66µs        ? ?/sec     1.05   762.0±13.74µs        ? ?/sec
iter_fragmented/base                            1.02    340.9±4.03ns        ? ?/sec     1.00    335.6±7.29ns        ? ?/sec
iter_fragmented/foreach                         1.00   166.0±25.83ns        ? ?/sec     1.09   180.3±35.82ns        ? ?/sec
iter_fragmented/foreach_wide                    1.00      3.7±0.05µs        ? ?/sec     1.24      4.6±0.05µs        ? ?/sec
iter_fragmented/wide                            1.00      3.8±0.10µs        ? ?/sec     1.01      3.9±0.12µs        ? ?/sec
iter_fragmented_sparse/base                     1.00      7.6±0.21ns        ? ?/sec     1.46     11.1±0.86ns        ? ?/sec
iter_fragmented_sparse/foreach                  1.02      7.9±0.25ns        ? ?/sec     1.00      7.8±0.12ns        ? ?/sec
iter_fragmented_sparse/foreach_wide             1.00     39.2±0.43ns        ? ?/sec     1.61     63.0±0.41ns        ? ?/sec
iter_fragmented_sparse/wide                     1.00     42.3±2.07ns        ? ?/sec     1.01     42.7±0.81ns        ? ?/sec
iter_simple/base                                1.00      8.3±0.02µs        ? ?/sec     1.01      8.3±0.03µs        ? ?/sec
iter_simple/foreach                             1.01      8.5±0.01µs        ? ?/sec     1.00      8.4±0.02µs        ? ?/sec
iter_simple/foreach_sparse_set                  1.01     25.9±0.14µs        ? ?/sec     1.00     25.6±0.35µs        ? ?/sec
iter_simple/foreach_wide                        1.00     41.8±0.39µs        ? ?/sec     1.11     46.3±0.71µs        ? ?/sec
iter_simple/foreach_wide_sparse_set             1.13   130.0±57.31µs        ? ?/sec     1.00    114.5±0.91µs        ? ?/sec
iter_simple/sparse_set                          1.00     28.8±0.21µs        ? ?/sec     1.02     29.4±0.23µs        ? ?/sec
iter_simple/system                              1.00      8.3±0.02µs        ? ?/sec     1.00      8.3±0.02µs        ? ?/sec
iter_simple/wide                                1.00     39.7±0.83µs        ? ?/sec     1.03     40.8±1.26µs        ? ?/sec
iter_simple/wide_sparse_set                     1.01    128.1±0.81µs        ? ?/sec     1.00    126.4±1.22µs        ? ?/sec
no_archetypes/system_count/0                    1.00      6.1±0.13ns        ? ?/sec     1.00      6.1±0.06ns        ? ?/sec
no_archetypes/system_count/100                  1.28    164.2±2.62µs        ? ?/sec     1.00    128.5±4.39µs        ? ?/sec
no_archetypes/system_count/20                   1.49     39.4±0.67µs        ? ?/sec     1.00     26.5±1.16µs        ? ?/sec
no_archetypes/system_count/40                   1.60     67.0±1.84µs        ? ?/sec     1.00     41.9±2.38µs        ? ?/sec
no_archetypes/system_count/60                   1.55    100.7±1.53µs        ? ?/sec     1.00     65.1±3.83µs        ? ?/sec
no_archetypes/system_count/80                   1.36    132.4±1.73µs        ? ?/sec     1.00     97.5±3.17µs        ? ?/sec
query_get/50000_entities_sparse                 1.00    289.0±3.51µs        ? ?/sec     1.00    288.9±0.83µs        ? ?/sec
query_get/50000_entities_table                  1.00    261.5±1.57µs        ? ?/sec     1.00    262.2±1.24µs        ? ?/sec
query_get_component/50000_entities_sparse       1.00    694.2±4.82µs        ? ?/sec     1.01   697.8±18.39µs        ? ?/sec
query_get_component/50000_entities_table        1.01    610.7±3.81µs        ? ?/sec     1.00    604.3±4.82µs        ? ?/sec
query_get_component_simple/system               1.00    579.7±6.19µs        ? ?/sec     1.03  599.8±107.32µs        ? ?/sec
query_get_component_simple/unchecked            1.00    686.4±4.34µs        ? ?/sec     1.00    686.4±9.30µs        ? ?/sec
query_get_many_10/50000_calls_sparse            1.00      4.4±0.40ms        ? ?/sec     1.05      4.6±0.54ms        ? ?/sec
query_get_many_10/50000_calls_table             1.00      4.0±0.38ms        ? ?/sec     1.04      4.2±0.39ms        ? ?/sec
query_get_many_2/50000_calls_sparse             1.01   639.0±85.54µs        ? ?/sec     1.00   630.4±51.38µs        ? ?/sec
query_get_many_2/50000_calls_table              1.00   657.6±53.77µs        ? ?/sec     1.00   655.5±41.88µs        ? ?/sec
query_get_many_5/50000_calls_sparse             1.00      2.0±0.25ms        ? ?/sec     1.05      2.1±0.35ms        ? ?/sec
query_get_many_5/50000_calls_table              1.00  1790.3±76.26µs        ? ?/sec     1.01  1810.1±123.77µs        ? ?/sec
schedule/base                                   1.12     37.4±1.86µs        ? ?/sec     1.00     33.5±1.86µs        ? ?/sec
sized_commands_0_bytes/2000_commands            1.00      4.4±0.03µs        ? ?/sec     1.01      4.4±0.05µs        ? ?/sec
sized_commands_0_bytes/4000_commands            1.01      8.8±0.04µs        ? ?/sec     1.00      8.8±0.45µs        ? ?/sec
sized_commands_0_bytes/6000_commands            1.01     13.3±0.06µs        ? ?/sec     1.00     13.1±0.03µs        ? ?/sec
sized_commands_0_bytes/8000_commands            1.00     17.8±0.08µs        ? ?/sec     1.00     17.7±0.09µs        ? ?/sec
sized_commands_12_bytes/2000_commands           1.00      4.8±0.02µs        ? ?/sec     1.00      4.8±0.01µs        ? ?/sec
sized_commands_12_bytes/4000_commands           1.00      9.6±0.04µs        ? ?/sec     1.02      9.8±0.02µs        ? ?/sec
sized_commands_12_bytes/6000_commands           1.01     14.5±0.07µs        ? ?/sec     1.00     14.4±0.08µs        ? ?/sec
sized_commands_12_bytes/8000_commands           1.01     19.4±0.12µs        ? ?/sec     1.00     19.2±0.32µs        ? ?/sec
sized_commands_512_bytes/2000_commands          1.00     58.3±1.85µs        ? ?/sec     1.00     58.2±1.94µs        ? ?/sec
sized_commands_512_bytes/4000_commands          1.00    118.6±8.59µs        ? ?/sec     1.00    118.1±8.37µs        ? ?/sec
sized_commands_512_bytes/6000_commands          1.01   182.7±23.39µs        ? ?/sec     1.00   181.5±22.46µs        ? ?/sec
sized_commands_512_bytes/8000_commands          1.01   243.7±34.63µs        ? ?/sec     1.00   242.0±30.64µs        ? ?/sec
spawn_commands/2000_entities                    1.00    169.8±7.39µs        ? ?/sec     1.03    174.4±4.21µs        ? ?/sec
spawn_commands/4000_entities                    1.00   352.3±14.06µs        ? ?/sec     1.02    358.3±9.68µs        ? ?/sec
spawn_commands/6000_entities                    1.00   519.5±19.88µs        ? ?/sec     1.04   539.4±18.56µs        ? ?/sec
spawn_commands/8000_entities                    1.00   696.6±22.98µs        ? ?/sec     1.05   730.7±25.27µs        ? ?/sec
spawn_world/10000_entities                      1.00   825.2±71.83µs        ? ?/sec     1.04   855.0±76.43µs        ? ?/sec
spawn_world/1000_entities                       1.00     82.9±7.85µs        ? ?/sec     1.04     86.1±8.78µs        ? ?/sec
spawn_world/100_entities                        1.00      8.2±0.85µs        ? ?/sec     1.03      8.4±0.85µs        ? ?/sec
spawn_world/10_entities                         1.00   828.7±76.28ns        ? ?/sec     1.04   857.9±81.80ns        ? ?/sec
spawn_world/1_entities                          1.00     83.0±7.56ns        ? ?/sec     1.06     88.1±9.74ns        ? ?/sec
world_entity/50000_entities                     1.00    120.4±0.94µs        ? ?/sec     1.00    119.9±0.38µs        ? ?/sec
world_get/50000_entities_sparse                 1.01    202.8±0.96µs        ? ?/sec     1.00    201.5±0.73µs        ? ?/sec
world_get/50000_entities_table                  1.00    169.3±1.19µs        ? ?/sec     1.00    169.1±6.02µs        ? ?/sec
world_query_for_each/50000_entities_sparse      1.01     53.7±0.25µs        ? ?/sec     1.00     53.3±0.19µs        ? ?/sec
world_query_for_each/50000_entities_table       1.00     27.3±0.10µs        ? ?/sec     1.00     27.1±0.03µs        ? ?/sec
world_query_get/50000_entities_sparse           1.00     96.4±0.90µs        ? ?/sec     1.00     95.9±0.66µs        ? ?/sec
world_query_get/50000_entities_sparse_wide      1.02    195.3±1.77µs        ? ?/sec     1.00    191.8±0.86µs        ? ?/sec
world_query_get/50000_entities_table            1.00    154.5±1.01µs        ? ?/sec     1.00    155.2±1.07µs        ? ?/sec
world_query_get/50000_entities_table_wide       1.00    234.4±0.88µs        ? ?/sec     1.00    233.8±0.62µs        ? ?/sec
world_query_iter/50000_entities_sparse          1.00     54.3±0.28µs        ? ?/sec     1.01     54.9±6.69µs        ? ?/sec
world_query_iter/50000_entities_table           1.00     27.2±0.08µs        ? ?/sec     1.00     27.2±0.11µs        ? ?/sec

james7132 avatar Feb 18 '23 05:02 james7132

@inbetweennames, I'd appreciate your review on this :)

alice-i-cecile avatar Feb 18 '23 14:02 alice-i-cecile

Yeah this does make sense to me. The #6161 PR can be adapted to use the new style.

InBetweenNames avatar Feb 19 '23 01:02 InBetweenNames

This is getting moved to 0.11, and merged at the start of the cycle, due to the risk of badly breaking users in subtle ways.

alice-i-cecile avatar Feb 19 '23 03:02 alice-i-cecile

Is the plan still to merge this early on in the 0.11 cycle?

joseph-gio avatar Mar 30 '23 17:03 joseph-gio

Not exactly early anymore, and I still need to address Boxy's comments, but I'd like to get it in during 0.11.

james7132 avatar Mar 30 '23 18:03 james7132

Delaying this until 0.12 due to the implications this has for every query iteration and the potential soundness issues that might arise from a bug in the changes. It's best to get this in early in a release cycle rather than this late.

james7132 avatar May 23 '23 07:05 james7132

@james7132, unless I hear otherwise, I'm going to merge this tomorrow :)

alice-i-cecile avatar Nov 26 '23 17:11 alice-i-cecile

As a final sanity check, the codegen of this PR seems to show no tangible difference from what is in main right now: https://github.com/james7132/bevy_asm_tests/commit/309947cd078086b7edc4b8b5f29b1d04255b1b9a#diff-4c4b34cf83f523fced3bd396ad7ab8e228b4d35bf65c1f0457f7e4e58b14ccc5.

@alice-i-cecile 👍

james7132 avatar Nov 26 '23 19:11 james7132

Changelog needs updating btw. Also, this is the first I'm hearing of it, but using for_each() instead for for x in query is faster?

JMS55 avatar Nov 28 '23 04:11 JMS55

@james7132 sorry, you lost the coin flip on the merge conflicts :( Merge it yourself when you're done <3

alice-i-cecile avatar Nov 28 '23 04:11 alice-i-cecile

@JMS55 Roughly, for loops can break but Iterator::for_each cannot, so sometimes using the method can thus save some work by not needing to worry about that. Whether it matters depends greatly on the exact iterator type in question -- it doesn't at all for Range or slice iterators -- and the overhead is usually small, so for a loop body doing a chunky amount of work the overhead of for is usually immeasurably small.

But it's easy to come up with examples where the Query iterator is non-trivial and the work to be done is simple, and thus using for_each instead can be a nice perf gain.

scottmcm avatar Nov 30 '23 21:11 scottmcm