bevy
bevy copied to clipboard
improve compile time by type-erasing wgpu structs
Objective
structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with #[derive(SystemParam)] i've been working on).
we can avoid that by boxing and type-erasing in the bevy render_resource wrappers.
type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.
Solution
- add macros to box and type-erase in debug mode
- leave current impl for release mode
timings:
| current | |||
|---|---|---|---|
| Total time: | 64.9s | ||
| bevy_pbr v0.9.0-dev | 19.2s | ||
| bevy_render v0.9.0-dev | 17.0s | ||
| bevy_sprite v0.9.0-dev | 15.1s | ||
| DepthPipelineBuilder | 18.7s | ||
| with type-erasing | diff | ||
| Total time: | 49.0s | -24% | |
| bevy_render v0.9.0-dev | 12.0s | -38% | |
| bevy_pbr v0.9.0-dev | 8.7s | -49% | |
| bevy_sprite v0.9.0-dev | 6.1s | -60% | |
| DepthPipelineBuilder | 1.2s | -94% |
the depth pipeline builder is a binary with body:
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};
fn main() {
println!("Hello, world p!\n");
}
#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material>
where M::Data: Eq + Hash + Clone,
{
render_device: Res<'w, RenderDevice>,
material_pipeline: Res<'w, MaterialPipeline<M>>,
material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
shadow_pipeline: Res<'w, ShadowPipeline>,
pipeline_cache: ResMut<'w, PipelineCache>,
render_meshes: Res<'w, RenderAssets<Mesh>>,
render_materials: Res<'w, RenderMaterials<M>>,
msaa: Res<'w, Msaa>,
#[system_param(ignore)]
_p: PhantomData<&'s M>,
}
I'm curious how this improves compile times. It looks like just as much functions need to be codegened, if not even more for the type erased variants.
I'm curious how this improves compile times. It looks like just as much functions need to be codegened, if not even more for the type erased variants.
yes it's weird, and probably something that should be improved in the compiler. as a guess, maybe the size for a struct containing generics needs to be recalculated for each generic combination, even when the generic is not a direct member (like in a box or an arc) and doesn't actually affect the size? the wgpu structs contain some very generic- and associated type-heavy code that is probably causing an exponential type explosion.
This PR doesn't compile in release on my mac
error[E0515]: cannot return reference to local variable `untyped`
--> crates/bevy_render/src/render_resource/resource_macros.rs:57:9
|
57 | &$value
| ^^^^^^^ returns a reference to data owned by the current function
|
::: crates/bevy_render/src/render_resource/pipeline_cache.rs:302:9
|
302 | render_resource_ref!(untyped, wgpu::PipelineLayout)
| --------------------------------------------------- in this macro invocation
|
= note: this error originates in the macro `render_resource_ref` (in Nightly builds, run with -Z macro-backtrace for more info)
This PR doesn't compile in release on my mac
fixed. will address the other feedback tomorrow if there's positive disposition.
This seems to be the main contributing factor for bevy_render compiling faster (70% of perf improvement):
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| Item | Self Time | Self Time Change | Time | Time Change | Item count | Cache hits | Blocked time | Incremental load time | Incremental hashing time |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| evaluate_obligation | -5.14754422s | -90.82% | -5.147455645s | -88.89% | -3939 | +0 | +0ns | +0ns | -1.646728ms |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
Additionally it saves ~30MB of ~200MB on the file size of of libbevy_render.rlib. Similar results for bevy_pbr (50% of perf improvement):
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| Item | Self Time | Self Time Change | Time | Time Change | Item count | Cache hits | Blocked time | Incremental load time | Incremental hashing time |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| evaluate_obligation | -8.455474125s | -95.79% | -8.477643881s | -94.68% | -8198 | +0 | +0ns | +0ns | -2.736299ms |
+-----------------------------------------------------+---------------+------------------+---------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
And saving ~50MB of ~140MB.
(measurements with single threaded compilation and incremental cache cleared)
maybe related: https://fasterthanli.me/articles/when-rustc-explodes#what-now
~sensitivy check: is BlackBox acceptable for a wrapper that contains a boxed type-erased value?~
In addition to the compilation benchmark, a runtime/render benchmark should be performed, so that it is known whether this has any impact here, due to missed optimizations.
Edit: Missed the #[cfg(debug_assertions)] on the macro, to only use this for debug builds. So no render benchmark necessary. :sweat:
you sent me down a rabbit hole of newtypes not being zero-cost before i read the edit :) ... apparently newtypes are only zero cost as long as they have a non-calloc style initialization, which would be true here since we're wrapping an arc which needs an initialized counter.
I thought it was needed so that we can take it in try_unwrap, and not subsequently decrement it on drop.
Maybe there is a better way, I’m not really sure how drop interoperates with a fn(mut self) that destructures the self.
On 28 Sep 2022, at 22:01, Carter Anderson @.@.>> wrote:
@cart commented on this pull request.
In crates/bevy_render/src/render_resource/resource_macros.rshttps://github.com/bevyengine/bevy/pull/5950#discussion_r982858371:
@@ -0,0 +1,108 @@ +// structs containing wgpu types take a long time to compile. this is particularly bad for generic +// structs containing wgpu structs. we avoid that in debug builds (and for cargo check and rust analyzer) +// by boxing and type-erasing with the
render_resource_wrappermacro. +// analysis from https://github.com/bevyengine/bevy/pull/5950#issuecomment-1243473071 indicates this is +// due toevaluate_obligations. we should check if this can be removed after a fix lands for +// https://github.com/rust-lang/rust/issues/99188 (and after otherevaluate_obligations-related changes). +#[cfg(debug_assertions)] +#[macro_export] +macro_rules! render_resource_wrapper {
- ($wrapper_type:ident, $wgpu_type:ty) => {
-
#[derive(Clone, Debug)] -
pub struct $wrapper_type(Option<std::sync::Arc<Box<()>>>);
I'm confused by the Option here. Why is this needed?
— Reply to this email directly, view it on GitHubhttps://github.com/bevyengine/bevy/pull/5950#pullrequestreview-1124352801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMCQEUSU5VDDTOEROX4U3KTWASWZTANCNFSM6AAAAAAQJ5QL3A. You are receiving this because you authored the thread.Message ID: @.***>
bors r+
Pull request successfully merged into main.
Build succeeded:
- build-and-install-on-iOS
- build-android
- build (macos-latest)
- build (ubuntu-latest)
- build-wasm
- build (windows-latest)
- build-without-default-features (bevy)
- build-without-default-features (bevy_ecs)
- build-without-default-features (bevy_reflect)
- check-compiles
- check-doc
- check-missing-examples-in-docs
- ci
- markdownlint
- run-examples
- run-examples-on-wasm
- run-examples-on-windows-dx12