rust icon indicating copy to clipboard operation
rust copied to clipboard

Differentiation functions with Box<dyn Trait> args fails

Open motabbara opened this issue 11 months ago • 15 comments

Please see https://fwd.gymni.ch/eTJnUQ

Fail on "Attempting to call an indirect active function whose runtime value is inactive".

#![feature(autodiff)]
use std::autodiff::autodiff;
use std::fmt;

#[derive(Debug)]
struct Foo {
    pub test: f64
}

pub trait Cool: fmt::Debug {
    fn gen(&self) -> f64;
}

impl Cool for Foo {
    fn gen(&self) -> f64 {
        self.test * self.test
    }
}


#[autodiff(dsquare, Reverse, Duplicated, Duplicated)]
pub fn square(num: &Foo, result: &mut f64) {
    *result = num.gen()
}

#[autodiff(dsquare2, Reverse, Duplicated, Duplicated)]
pub fn square2(num: &Box<dyn Cool>, result: &mut f64) {
    *result = num.gen()
}

Incidentally, generic functions fail to differentiate even without the box e.g,.,

#[autodiff(dsquare3, Reverse, Duplicated, Duplicated)]
pub fn square3<U: Cool>(num: &U, result: &mut f64) {
    *result = num.gen()
}

motabbara avatar Jan 18 '25 11:01 motabbara

@ZuseZ4, any recommendations about where in the codebase to look to examine calling traits through Box? Happy to attempt to try something myself with some pointers.

motabbara avatar Jan 20 '25 05:01 motabbara

I'm currently traveling, but I'll be back at my laptop on the 23rd, then I can look closer at the runtime inactivity. In the meantime, if you have a local build, can you run cargo +expand and post the ad macro expansions? Otherwise there might be flags to get the output from the explorer.

Support for Generics should be easy to add, we had support in an earlier implementation. You need to adjust the frontend to not error on generics, and adjust the autodiff function body to call the generic primal function. I will look up the two locations in my frontend pr that you'd need to modify for that.

ZuseZ4 avatar Jan 20 '25 12:01 ZuseZ4

Here it is:

#![feature(prelude_import)]
#![feature(autodiff)]
#[prelude_import]
use std::prelude::rust_2021::*;
#[macro_use]
extern crate std;
use std::autodiff::autodiff;
use std::fmt;
struct Foo {
    pub test: f64,
}
#[automatically_derived]
impl ::core::fmt::Debug for Foo {
    #[inline]
    fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result {
        ::core::fmt::Formatter::debug_struct_field1_finish(f, "Foo", "test", &&self.test)
    }
}
pub trait Cool: fmt::Debug {
    fn gen(&self) -> f64;
}
impl Cool for Foo {
    fn gen(&self) -> f64 {
        self.test * self.test
    }
}
#[rustc_autodiff]
#[inline(never)]
pub fn square(num: &Foo, result: &mut f64) {
    *result = num.gen();
}
#[rustc_autodiff(Reverse, Duplicated, Duplicated, None)]
#[inline(never)]
pub fn dsquare(num: &Foo, dnum: &mut Foo, result: &mut f64, dresult: &mut f64) {
    unsafe {
        asm!("NOP", options(pure, nomem));
    };
    ::core::hint::black_box(square(num, result));
    ::core::hint::black_box((dnum, dresult));
}
#[rustc_autodiff]
#[inline(never)]
pub fn square2(num: &Box<dyn Cool>, result: &mut f64) {
    *result = num.gen();
}
#[rustc_autodiff(Reverse, Duplicated, Duplicated, None)]
#[inline(never)]
pub fn dsquare2(
    num: &Box<dyn Cool>,
    dnum: &mut Box<dyn Cool>,
    result: &mut f64,
    dresult: &mut f64,
) {
    unsafe {
        asm!("NOP", options(pure, nomem));
    };
    ::core::hint::black_box(square2(num, result));
    ::core::hint::black_box((dnum, dresult));
}
fn main() {
    for i in 0..5 {
        let mut d_foo = Foo { test: 0.0 };
        let f = Foo { test: i as f64 };
        let mut c = 0.0;
        let mut d_c = 1.0;
        let r = dsquare(&f, &mut d_foo, &mut c, &mut d_c);
        {
            ::std::io::_print(format_args!("d_foo {0:?}\n", d_foo));
        };
        let mut d_foo: Box<dyn Cool> = Box::new(Foo { test: 0.0 });
        let f: Box<dyn Cool> = Box::new(Foo { test: i as f64 });
        let r = dsquare2(&f, &mut d_foo, &mut c, &mut d_c);
        {
            ::std::io::_print(format_args!("d_foo {0:?}\n", d_foo));
        };
    }
}

motabbara avatar Jan 21 '25 08:01 motabbara

@wsmoses Any suggestions?

ZuseZ4 avatar Feb 06 '25 05:02 ZuseZ4

I’ve been looking into this Box<dyn Trait> differentiation failure - it’s an interesting one. The error - "Attempting to call an indirect active function whose runtime value is inactive" - suggests something’s off with how autodiff handles dynamic dispatch. For square with a concrete Foo, the macro can resolve num.gen() statically - no problem there. But with square2 and Box<dyn Cool>, it’s all runtime vtable lookups - maybe that’s where the macro or runtime loses track of the computation graph.

Looking at the expanded code - dsquare2 calls square2 via black_box, but I’m wondering if the trait object’s indirection breaks the "active" state tracking needed for gradients. Is this a limitation in the frontend where the macro processes dyn Trait? Or is it lower down - maybe in the backend with how rustc_autodiff interacts with Rust/LLVM for these cases?

@ZuseZ4 - Any suggestions on where in the codebase to start investigating trait object support? I’m guessing either the macro expansion logic - or wherever runtime activity is managed - but I’m not sure where to focus. If there’s a simple test I could run - like bypassing dynamic dispatch to narrow it down - I’d be happy to try it with some guidance. Also, you noted generics needing frontend tweaks - would trait objects follow a similar fix?

I’d like to dig deeper into this. Thanks for any pointers you can share!

KMJ-007 avatar Apr 04 '25 06:04 KMJ-007

@KMJ-007 The error message comes from Enzyme:

➜  enzyme git:(a35f4f7) rg "Attempting to call an indirect active"                                         
Enzyme/AdjointGenerator.h
4938:            "Attempting to call an indirect active function "
5279:              "Attempting to call an indirect active function "

but I don't see explanations on the website (enzyme.mit.edu), or on the code next to it. You can generally verify if something is caused by Enzyme by generating an LLVM reproducer as described here: https://enzyme.mit.edu/rust/debug_backend.html#reporting-backend-crashes

I just remembered that a lot of people were confused about this in the past, and I did find a few issues of users (including me, lol) asking about it. If you learn anything about it, please make a PR against github.com/EnzymeAD/rustbook to update our docs here, either under chapter 4 or 12. Even if we don't find a full solution directly, we'd want to avoid that the next person spending time on it has to start from zero.

To figure out how to use it, I'd recommend to start by lowering a reproducer (like the one in the first post) to LLVM-IR, and reproducing it through opt first. Once you managed that, you can try to manually rewrite the llvm-ir to include the virtualreverse thing, and see if that fixes anything. Maybe EnzymeAd/Enzyme also has testcases using it, which could help with understanding usages. IF you manage to get anything to work (or are stuck) just ping me, and we can go backwards from there, trying to generate the right code from Rust to automate what you did by hand. If you notice that you can't find some needed Rust code in LLVM-IR, then you can try to use std::hint::black_box() to wrap Rust variables, this way rust and llvm shouldn't optimize them away, and you can use them when manually experimenting with LLVM-IR. You can also use extern "Rust" (or C) if you want to see how a declaration get's lowered to LLVM-IR (or you can copy the __enzyme_autodiff declarations which should already exist in the module).

https://github.com/EnzymeAD/Enzyme/issues/316 https://github.com/EnzymeAD/Enzyme/issues/1455 https://github.com/EnzymeAD/Enzyme/issues/929 https://github.com/EnzymeAD/Enzyme/issues/891 https://github.com/EnzymeAD/Enzyme/issues/737 https://github.com/EnzymeAD/Enzyme/issues/2178 (Not sure if related:) https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity

ZuseZ4 avatar Apr 07 '25 23:04 ZuseZ4

@KMJ-007 The error message comes from Enzyme:

➜  enzyme git:(a35f4f7) rg "Attempting to call an indirect active"                                         
Enzyme/AdjointGenerator.h
4938:            "Attempting to call an indirect active function "
5279:              "Attempting to call an indirect active function "

but I don't see explanations on the website (enzyme.mit.edu), or on the code next to it. You can generally verify if something is caused by Enzyme by generating an LLVM reproducer as described here: https://enzyme.mit.edu/rust/debug_backend.html#reporting-backend-crashes

I just remembered that a lot of people were confused about this in the past, and I did find a few issues of users (including me, lol) asking about it. If you learn anything about it, please make a PR against github.com/EnzymeAD/rustbook to update our docs here, either under chapter 4 or 12. Even if we don't find a full solution directly, we'd want to avoid that the next person spending time on it has to start from zero.

To figure out how to use it, I'd recommend to start by lowering a reproducer (like the one in the first post) to LLVM-IR, and reproducing it through opt first. Once you managed that, you can try to manually rewrite the llvm-ir to include the virtualreverse thing, and see if that fixes anything. Maybe EnzymeAd/Enzyme also has testcases using it, which could help with understanding usages. IF you manage to get anything to work (or are stuck) just ping me, and we can go backwards from there, trying to generate the right code from Rust to automate what you did by hand. If you notice that you can't find some needed Rust code in LLVM-IR, then you can try to use std::hint::black_box() to wrap Rust variables, this way rust and llvm shouldn't optimize them away, and you can use them when manually experimenting with LLVM-IR. You can also use extern "Rust" (or C) if you want to see how a declaration get's lowered to LLVM-IR (or you can copy the __enzyme_autodiff declarations which should already exist in the module).

EnzymeAD/Enzyme#316 EnzymeAD/Enzyme#1455 EnzymeAD/Enzyme#929 EnzymeAD/Enzyme#891 EnzymeAD/Enzyme#737 EnzymeAD/Enzyme#2178 (Not sure if related:) https://enzyme.mit.edu/julia/stable/faq/#Runtime-Activity

thankyou, I was scratching my head for this, this will help really great way, i have updated the docs for now and created PR in EnzymeAD/rustbook, if anything wrong or something needs to be changed please let me know, I will look into the reproducible soon

KMJ-007 avatar Apr 09 '25 04:04 KMJ-007

@motabbara I just created an issue for generics, and while poking it a bit I realized that the rustc_autodiff attribute can already handle generics, just the macro not, so it should be a trivial fix. Using cargo +enzyme expand and manually copying the <U: Cool> bounds over from square3 to dsquare3, I got it to compile:

#![feature(autodiff)]
#![feature(prelude_import)]
#![feature(print_internals)]
#![feature(fmt_helpers_for_derive)]
#![feature(rustc_attrs)]
#[prelude_import]
use std::prelude::rust_2021::*;
use std::arch::asm;
#[macro_use]
extern crate std;
use std::autodiff::autodiff;
use std::fmt;
struct Foo {
    pub test: f64,
}
#[automatically_derived]
impl ::core::fmt::Debug for Foo {
    #[inline]
    fn fmt(&self, f: &mut ::core::fmt::Formatter) -> ::core::fmt::Result {
        ::core::fmt::Formatter::debug_struct_field1_finish(f, "Foo", "test", &&self.test)
    }
}
pub trait Cool: fmt::Debug {
    fn gen(&self) -> f64;
}
impl Cool for Foo {
    fn gen(&self) -> f64 {
        self.test * self.test
    }
}
#[rustc_autodiff]
#[inline(never)]
pub fn square3<U: Cool>(num: &U, result: &mut f64) {
    *result = num.gen();
}
#[rustc_autodiff(Reverse, 1, Duplicated, Duplicated, None)]
#[inline(never)]
pub fn dsquare3<U: Cool>(num: &U, dnum_0: &mut U, result: &mut f64, dresult_0: &mut f64) {
    unsafe {
        asm!("NOP", options(nomem));
    };
    ::core::hint::black_box(square3(num, result));
    ::core::hint::black_box((dnum_0, dresult_0));
}
fn main() {
    let mut result = 1.0;
    let mut dresult = 0.0;
    let foo = Foo { test: 3.0 };
    let mut dfoo = Foo { test: 0.0 };
    dsquare3(&foo, &mut dfoo, &mut result, &mut dresult);
    {
        ::std::io::_print(format_args!("Result: {0}\n", result));
    };
    {
        ::std::io::_print(format_args!("dResult: {0}\n", dresult));
    };
}

ZuseZ4 avatar Apr 19 '25 09:04 ZuseZ4

Hey @ZuseZ4 can you give me some beginner friendly issues so I can get more idea about the enzyme codebase and LLVM

KMJ-007 avatar Apr 19 '25 10:04 KMJ-007

That's great @ZuseZ4 . How does the dyn trait stuff look?

motabbara avatar Apr 22 '25 03:04 motabbara

@motabbara I wrote down some instructions on how to analyze it and potentially add support in https://github.com/EnzymeAD/rust/issues/193#issuecomment-2784871517, but I don't think anyone is currently looking at it, so feel free to give it a try if you have time. My own focus this week is to get std::autodiff on nightly, I currently have linux (x86-64+aarch) working, windows and macos are failing CI: https://github.com/rust-lang/rust/pull/140064

ZuseZ4 avatar Apr 22 '25 03:04 ZuseZ4

Hey @ZuseZ4 , I can look into this..

Shourya742 avatar Apr 22 '25 13:04 Shourya742

@motabbara Just as a small update, I reviewed and merged a fix for generic support a few days ago.

ZuseZ4 avatar May 21 '25 02:05 ZuseZ4

@ZuseZ4 is that in nightly or should I be building my own from one of the branches here?

luxteknika avatar May 21 '25 03:05 luxteknika

@luxteknika https://rustc-dev-guide.rust-lang.org/autodiff/installation.html#installation For now you still need to build it from source.

ZuseZ4 avatar May 21 '25 03:05 ZuseZ4