rustc_codegen_cranelift
rustc_codegen_cranelift copied to clipboard
Attempting to compile `bevy_reflect` with cranelift causes STATUS_ACCESS_VIOLATION during compilation
Compiling on windows with the latest CI artifact build.
Full error message:
Compiling bevy_reflect v0.7.0
error: could not compile `bevy_reflect`
Caused by:
process didn't exit successfully: `rustc --crate-name bevy_reflect --edition=2021 C:\Users\elect\.cargo\registry\src\github.com-1ecc6299db9ec823\bevy_reflect-0.7.0\src\lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no -C debuginfo=2 -C debug-assertions=on -C metadata=93cc29b344f5fd9d -C extra-filename=-93cc29b344f5fd9d --out-dir C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps -L dependency=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps --extern bevy_reflect_derive=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\bevy_reflect_derive-7a7ec19481b36f7b.dll --extern bevy_utils=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\libbevy_utils-638c5a2ec5f001df.rmeta --extern downcast_rs=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\libdowncast_rs-f56d5e827a193966.rmeta --extern erased_serde=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\liberased_serde-9c11269d65f6973f.rmeta --extern parking_lot=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\libparking_lot-63f537677751ed65.rmeta --extern serde=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\libserde-b6c12e91e4361d2b.rmeta --extern thiserror=C:\Assets\ProgrammingProjects\Rust\testing\target\debug\deps\libthiserror-546b84b988865560.rmeta --cap-lints allow -Cpanic=abort -Zpanic-abort-tests -Zcodegen-backend=C:\users\elect\desktop\cg_clif\build\bin\rustc_codegen_cranelift.dll --sysroot C:\users\elect\desktop\cg_clif\build -L native=C:\Users\elect\.cargo\registry\src\github.com-1ecc6299db9ec823\winapi-x86_64-pc-windows-gnu-0.4.0\lib` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)
Not sure what's causing this. Happens consistently, so it's not random, but I have no idea what's in bevy_reflect that causes it. It's a pretty simple crate, doesn't do anything particularly weird.
Unfortunately this crate is depended on by the most important bevy crates, so bevy as a whole can't be compiled on cranelift. Not certain if this is limited to windows/just my PC, unable to test it anywhere else. Going to try and play with it to see if I can learn anything about it.
EDIT: This appears to not apply to the main branch of bevy. Instead, the main branch has the same error on bevy_hierarchy.
EDIT 2: This appears to also not apply to compiling the main branch directly, rather than as a git dependency. That works fine.
EDIT 3: After some testing, these appear to be the conditions of the crash:
- There must be a tuple struct type T present in a dependency with the following conditions: (named structs and any enum types do not cause this issue)
- The type must contain a field which is an instantiation of a generic type, in which the instantiation passes a slice or array type of (seemingly) any type or length directly. Passing a
&[T]does not trigger it; only a[T]. As such this does not occur if nesting types, unless the nesting is to produce a multidimensional array. This also does not occur if the field itself is an array or slice. Not all dynamically sized types cause this issue;stris fine. Not all types taking a const generic parameter cause this issue; aCustom<const T: u32>is fine. - The type must
#[derive(Component)], referring tobevy_ecs'sComponenttrait. Manual impls do not cause this issue. The manual impl is as follows:
impl Component for T {
type Storage = TableStorage;
}
No other derive I tested causes this issue.
This issue occurs for the Children type in bevy_hierarchy which has the following definition:
/// Contains references to the child entities of this entity
#[derive(Component, Debug, Reflect)]
#[reflect(Component, MapEntities)]
pub struct Children(pub(crate) SmallVec<[Entity; 8]>);
Unable to identify the problem in bevy_reflect. There are no instances of #[derive(Component)], but many other macros which are significantly more complex than Component.
Are you able to attach a debugger and get a backtrace when it crashes?
Yes, but I'm not sure what to look for. It might be hitting a UD2 instruction but I don't really think so, that's about all that stands out to me. The stack is just a bunch of offsets in the assembly. I tried building from source with --debug to see if that would put debug info back in, but that doesn't seem to have done anything.
Edit: Also just noticed that jit and lazy-jit have the same issue with a hello world project.
I'm probably not going to get around debugging this myself soon as I don't have a windows dev environment for rust set up. I unfortunately also don't have a clue where the bug could be, so I can't do a blind attempt at fixing it either. Linux has the best support as that is what I use myself. *Bsd also works really well as it is close enough to Linux. macOS should work too most of the time, but support is a tiny bit worse as it uses a different object file format. Windows support is not the best as you noticed. Only the MinGW toolchain is supported at all.
That's unfortunate. I don't have the expertise to debug any further than I have, but hopefully someone will be able to pick this up at a later date.
https://github.com/bjorn3/rustc_codegen_cranelift/pull/1255 shows an ABI incompatibility with MSVC. Debugging that one should probably be easier than this.
Hey @PROMETHIA-27, would you be able to test this branch to see if it fixes the issues you are experiencing?
It contains a fix for an ABI issue that sounds suspiciously like what you are hitting.
I was unable to build the project (kept getting error: the 'cargo.exe' binary, normally provided by the 'cargo' component, is not applicable to the 'nightly-2022-07-27-x86_64-pc-windows-gnu' toolchain), but I grabbed an artifact off of CI and that worked! Looks like bevy can now run on cargo_cranelift on windows. Unfortunately jit and lazy-jit seem to still just STATUS_ACCESS_VIOLATION rustc while compiling. Maybe it's not properly removing the execution protection on the allocations it uses to emit code?
but I grabbed an artifact off of CI and that worked!
That's great! 🎉
Unfortunately jit and lazy-jit seem to still just STATUS_ACCESS_VIOLATION rustc while compiling. Maybe it's not properly removing the execution protection on the allocations it uses to emit code?
I haven't really tested anything with the jit, I think I disabled it early on for windows when porting the testsuite into rust.
Hopefully it's something simple 😄
but I grabbed an artifact off of CI and that worked!
Nice!
Unfortunately jit and lazy-jit seem to still just STATUS_ACCESS_VIOLATION rustc while compiling. Maybe it's not properly removing the execution protection on the allocations it uses to emit code?
Cranelift-jit should set the mappings to RX before an attempt at executing the code is made.
We do run the cranelift test suite on the JIT, and it runs fine on windows. But I don't know much beyond that.
@PROMETHIA-27 Did https://github.com/bjorn3/rustc_codegen_cranelift/pull/1284 fix this?
So:
- I was able to actually build the project this time, after clearing my toolchains out. And letting it reinstall the override one. I guess that was on my side
- Building hello world works fine
- Building with bevy as a dependency works, but trying to run a default app segfaults
- Both jit modes segfault while compiling on a hello_world project
- At least regular jit (probably lazy too) fails to compile with bevy as a dependency because static libs (this seems intended, but i figured it was worth noting since this implies that it segfaults sometime after this step; which occurs after every crate is compiled)
I think that's a regression overall? IIRC I had the default app actually running last time
Building with bevy as a dependency works, but trying to run a default app segfaults
Yeah, this sounds like a regression, I'll try to reproduce that locally
The JIT segfaulting is also something I'm able to reproduce, but It was segfaulting before, so I don't think that one is a regression.
I'm having some issues reproducing this. I've cloned bevy, and ran
..\rustc_codegen_cranelift\build\cargo-clif.exe run -v --example breakout
This compiles fine all the way until the last linking step, and then it seems to be stuck on the last linking step. The linker is using 100% of one cpu core, and has been running for the last 10 mins. Did you also run into this?
I tried both on the latest and main branch, I also tried --example rotation and -Zshare-generics=off because is saw some users with a linker error. But all of those yield the same issue.
build output:
...
Compiling bevy v0.9.0-dev (C:\Users\Afonso\CLionProjects\bevy)
Running `rustc --crate-name breakout --edition=2021 examples/games/breakout.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link
-C embed-bitcode=no -C debuginfo=2 --cfg "feature=\"animation\"" --cfg "feature=\"bevy_asset\"" --cfg "feature=\"bevy_audio\"" --cfg "feature=\"bevy_gilrs\"" --cfg "feature=\"bevy_scene\"" --cfg "fe
ature=\"bevy_winit\"" --cfg "feature=\"default\"" --cfg "feature=\"filesystem_watcher\"" --cfg "feature=\"hdr\"" --cfg "feature=\"png\"" --cfg "feature=\"render\"" --cfg "feature=\"vorbis\"" --cfg "f
eature=\"x11\"" -C metadata=a5f9a42575642687 --out-dir C:\Users\Afonso\CLionProjects\bevy\target\debug\examples -C incremental=C:\Users\Afonso\CLionProjects\bevy\target\debug\incremental -L dependenc
y=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps --extern anyhow=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libanyhow-15561fa7fc3f4ef7.rlib --extern bevy=C:\Users\Afonso\CLionProjects
\bevy\target\debug\deps\libbevy-50ceb663272dd135.rlib --extern bevy_internal=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libbevy_internal-34cbf0c23f394258.rlib --extern bytemuck=C:\Users\Afo
nso\CLionProjects\bevy\target\debug\deps\libbytemuck-edb0d53af52ff1a8.rlib --extern crossbeam_channel=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libcrossbeam_channel-badd8b417dd3f093.rlib -
-extern futures_lite=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libfutures_lite-e798f34ff12dff42.rlib --extern rand=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\librand-240071f861a3
028b.rlib --extern ron=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libron-95741b88d4d68ccd.rlib --extern serde=C:\Users\Afonso\CLionProjects\bevy\target\debug\deps\libserde-14009cd3ea1fb87c.
rlib -Cpanic=abort -Zpanic-abort-tests -Zcodegen-backend=C:\Users\Afonso\CLionProjects\rustc_codegen_cranelift\build\bin\rustc_codegen_cranelift.dll --sysroot C:\Users\Afonso\CLionProjects\rustc_code
gen_cranelift\build -L native=C:\Users\Afonso\.cargo\registry\src\github.com-1ecc6299db9ec823\windows_x86_64_msvc-0.36.1\lib -L native=C:\Users\Afonso\.cargo\registry\src\github.com-1ecc6299db9ec823\
windows_x86_64_msvc-0.37.0\lib`
Building [=======================> ] 277/278: breakout(example)
Edit: Ok, so I re read this issue and it looks like you are using a hello world application (presumably this?) and compiling bevy as a dependency.
When I tried to do that, everything compiled and ran as expected.
Here's a repo with what I tried. Can you test that and see if it crashes for you?
I tested two projects: one was hello world,
fn main() {
println!("Hello, world!");
}
and the other was a bevy default app.
use bevy::prelude::*;
fn main() {
App::new().add_plugins(DefaultPlugins).run();
}
Your repo does seem to work fine.
Right, with the default app, I run into the linker being stuck issue. 😕
I'm going to try and troubleshoot that
So, I let this run for a while with the default app, and it turns out its just really really slow. 18min to compile slow 🧐
Compiling bevy_hello_world v0.1.0 (C:\Users\Afonso\CLionProjects\bevy_hello_world)
Finished dev [unoptimized + debuginfo] target(s) in 18m 09s
Running `target\debug\bevy_hello_world.exe`
thread 'main' panicked at 'Initializing the event loop outside of the main thread is a significant cross-platform compatibility hazard. If you really, absolutely need to create an EventLoop on a diff
erent thread, please use the `EventLoopExtWindows::new_any_thread` function.', C:\Users\Afonso\.cargo\registry\src\github.com-1ecc6299db9ec823\winit-0.26.1\src\platform_impl\windows\event_loop.rs:139
:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\bevy_hello_world.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)
PS C:\Users\Afonso\CLionProjects\bevy_hello_world>
But at least I can reproduce something, I'll track this down and see where it gets. Is that the error that you are also seeing?
Edit: It seems some people are also hitting this with LLVM, so it may be an issue with the VS version I'm using
Edit 2: I replaced the default linker with rust-lld.exe and it compiled and ran the default app fine, can you test this branch?
Edit 3: (sorry I don't want to spam notifications)
I re-tried the bevy example breakout with the rust-lld.exe trick and it compiled and ran fine on my system! 🎉
I wonder if the issues we are seeing are all related to MSVC linker stuff? I'm using VS2019 with this linker version:
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community>link.exe
Microsoft (R) Incremental Linker Version 14.29.30146.0
Copyright (C) Microsoft Corporation. All rights reserved.
No, the error I got was a segfault (or window's equivalent, STATUS_ACCESS_VIOLATION) and it didn't take an abnormal amount of time to compile. Switching to LLD (like this) does not change this. My MSVC linker's not in quite the same place as yours (it's deeper in that same directory), but it's v14.32.31332.0
It looks like you are using VS2022, which is the latest version. I downloaded and installed VS2022 (14.33.31630.0), rebuilt everything with it, took 15 min to link and ran into the same issue as before 😕 .
~~On an interesting note, I was able to compile and run the bevy breakout example in 2min, so its no longer hitting that issue there.~~ (Edit: Nevermind, I left the rust-lld.exe in there by accident, without it, it still takes a long time.)
I'm a bit at a loss here... Can you send a backtrace or run it in WinDbg to see where its crashing?
I took a look with WinDbg but I still don't know what I'm looking for. It seemed to crash on a function pointer call? It segfaults on call rdx, where rdx has the value 0x00007ff7f687a7dc. Across different builds this is mostly the same, although I think it was rbx once and a slightly different but mostly similar offset. The previous line was mov rdx, qword ptr [7FF7F7FBCC78h] or something very similar both times.
The call stack doesn't seem useful, it's just one frame named [0x0] cranelift_test + 0x3579e41.
Nothing else pops out at me as being useful/relevant. Is cranelift able to build .pdbs? I think that would be useful but it doesn't seem to. Or embed more debug info. Normally windbg can actually point back to source code from assembly, but with cranelift builds it seems to be completely lacking any debug info.
While messing around with the loaded plugins I managed to get it to crash with a panic, which also seemed to raise an ILLEGAL_INSTRUCTION error, after the panic printed.
I narrowed down the source of the segfault to LogPlugin; if it's added before all other plugins (the default behavior with DefaultPlugins) it will segfault. It seems like aside from normal dependency panics, this is the only real issue with the default plugins/reordering them. I was unable to determine what ordering characteristics cause the segfault, because moving other plugins around moves where LogPlugin must be to prevent a crash, but primarily it's safest to add it as late as possible.
I don't have a clue why, because it doesn't seem particularly unsafe or complex. With the tracing feature on it can do some panic hook manipulation, but that's off by default. However, I did notice that it seemed to be particularly likely to crash when added before rendering related plugins like TextPlugin, WinitPlugin, and RenderPlugin. That might just be coincidence though.
Edit: Also tried getting a backtrace with RUST_BACKTRACE=1, didn't work.
Edit 2: The problem is even narrower than I thought! It's crashing at app construction time, not when the app is running. Just adding default plugins and not calling .run() crashes with a segfault. So something that LogPlugin is doing in the build function is the cause.
So something that LogPlugin is doing in the build function is the cause.
Not necessarily. If you have a miscompilation, minor unrelated changes can cause the miscompiled part to flip between crashing and not crashing for example due to a different value in a register somewhere or due to different code layout. Unfortunately miscompilations are pretty hard to debug. Right now I'm debugging a miscompilation that happens on the main branch of Cranelift (not 0.88.1 like cg_clif uses right now. 0.88.0 had a another miscompilation on AArch64, so I had to skip it)
Is cranelift able to build .pdbs? I think that would be useful but it doesn't seem to.
I do have a pdb next to the exe on the target dir, but to be honest I've never checked if they work. I did notice that a regular rust pdb for the bevy_hello_world has 350MiB but the ones produced by cg_clif have 50MiB, so maybe we aren't including a lot of info on it?
I tried a bunch of plugin combinations and different orders on my machine, but none of them seemed to cause that issue.
Yesterday I had the thought "Hey, maybe everything magically works on my machine, lets try a different one", so I added CI today to the bevy_hello_world repo. And... it works, so that's not very helpful.
But then another thing occurred to me, In that CI run I'm using the cg_clif artifacts from CI and last time you tried and it worked it was also a CI artifact. So maybe that figures into it? Could you try running it on your machine with the CI artifacts?
Looking at the commit log between the last version that worked and now, the other thing that jumps out at me was the parallelization effort by @bjorn3. Although I think that would be unlikely since it looks you can reproduce this consistently.
Either way, I tried a bunch of -j values (Which I think is the correct flag to control the jobserver... But I'm not too sure) to see if it would crash, but no luck there.
Another thing that is different is there are some slight differences in the stack probing code that you tried last time (and worked) and the code that actually was included upstream, so maybe there is a bug there? If the artifact thing fails, I'm going to try to revert to a old version of cranelift with the new stack probing code to see if that triggers something.
Thanks for all your effort debugging this! Its been very helpful on narrowing things down!
I do have a pdb next to the exe on the target dir, but to be honest I've never checked if they work. I did notice that a regular rust pdb for the bevy_hello_world has 350MiB but the ones produced by cg_clif have 50MiB, so maybe we aren't including a lot of info on it?
cg_clif only knows how to generate DWARF debuginfo. As such there is no debuginfo on Windows. Not sure why there is a pdb that large generated. If you know of any crate to generate codeview debuginfo please let me know.
Which I think is the correct flag to control the jobserver... But I'm not too sure
It is.
It's been a few minutes since I started building with the msvc artifact, so I think you're onto something. Will update when it's finished but I suspect that because of the toolchain override, when I built it built a gnu toolchain cargo-clif, but I use stable-msvc normally. So I can't imagine that was a nice interaction. Next I'll try building with my build on stable-gnu and see if that works.
EDIT: Should've waited one more minute. Just finished, in 11min, and I got the same event loop not-main-thread error that was mentioned. My build did not work, but something interesting I noticed is that either artifact force installs a matching nightly toolchain when run, while my build does not. gnu artifact not only built but ran perfectly.
Should've waited one more minute. Just finished, in 11min, and I got the same event loop not-main-thread error that was mentioned.
That sounds like exactly what I've been seeing when linking with the default linker, using rust-lld.exe fixes that and makes the issue go away. I think that is a separate issue, although its worth investigating later.
when I built it built a gnu toolchain cargo-clif, but I use stable-msvc normally. So I can't imagine that was a nice interaction. Next I'll try building with my build on stable-gnu and see if that works.
Building x86_64-pc-windows-gnu from x86_64-pc-windows-msvc does have some issues. I've had to disable some tests when I enabled that in CI.
But if I understand this correctly you had a x86_64-pc-windows-gnu cg_clif cross-compiling to a x86_64-pc-windows-msvc target, right? (I also haven't tested that configuration)
My build did not work, but something interesting I noticed is that either artifact force installs a matching nightly toolchain when run, while my build does not. gnu artifact not only built but ran perfectly.
I'm not too sure about why this happens.
I had built cg_clif with x86_64-pc-windows-gnu and then attempted to use it with a x86_64-pc-windows-msvc toolchain, if that answers your question. I thought that the problem would be that the build requires those force-installed toolchains, but that's not the case either, none of the toolchains involved work. Seems like the build is just completely broken.
If you build cg_clif with x86_64-pc-windows-gnu it should get a x86_64-pc-windows-gnu sysroot and thus error when trying to use x86_64-pc-windows-msvc. How are you even mixing them? --target x86_64-pc-windows-msvc? Could you try building cg_clif with x86_64-pc-windows-msvc?
I just set my default toolchain to stable-msvc and then call cargo-clif run. And no, I can't build it with msvc; it says this:
The MSVC toolchain is not yet supported by rustc_codegen_cranelift.
Switch to the MinGW toolchain for Windows support.
Hint: You can use `rustup set default-host x86_64-pc-windows-gnu` to
set the global default target to MinGW
That error has been removed in https://github.com/bjorn3/rustc_codegen_cranelift/pull/1284, which I merged moments before I posted my message if it works now. Can you try updating cg_clif to the latest commit to see if it works now?
weird... seems I messed up git and was working off the old version the whole time. It's still building but it seems like it works like the artifact now. Edit: Once again, should've just waited one more minute. Same result as msvc artifact.