candle
candle copied to clipboard
Function 'cast_bf16_f16' does not exist
Using the stable-diffusion-3 example on a Mac with metal feature. When I run these lines
let vb_vae = vb.clone().rename_f(sd3_vae_vb_rename).pp("first_stage_model");
println!("vb_vae dtype {:?}", vb_vae.dtype());
let autoencoder = build_sd3_vae_autoencoder(vb_vae)?;
println!("autoencoder {:?}", autoencoder);
I get the error when it is trying to build the autoencoder
vb_vae dtype F16
creating autoencoder
Error: Metal error Error while loading function: "Function 'cast_bf16_f16' does not exist"
Probably the same as #2406
so does that mean its a clang problem and not a candle/metal problem? this one is for f16 not sure how to troubleshoot?
I doubt that it's related to the f16 part, the most like is the bf16 as this part is detected automatically and fails for older versions of clang (and maybe in other cases?)
The detection is conditioned in the .metal file by the __HAVE_BFLOAT__ variable, googling around you find this issue that got mentioned in the other thread https://github.com/tinygrad/tinygrad/issues/3453 .
I finally found a workaround for the Function 'cast_bf16_f32' does not exist error. Rust's minimum macOS deployment target is 11.0:
$ rustc --print deployment-target
MACOSX_DEPLOYMENT_TARGET=11.0
Setting the MACOSX_DEPLOYMENT_TARGET env var to at least 14.0 is the solution.
$ MACOSX_DEPLOYMENT_TARGET=15.3 cargo run --example mistral --release --features metal -- --prompt 'Write helloworld code in Rust' --sample-len 150
Finished `release` profile [optimized] target(s) in 0.48s
Running `target/release/examples/mistral --prompt 'Write helloworld code in Rust' --sample-len 150`
avx: false, neon: true, simd128: false, f16c: false
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 17.53025ms
loaded the model in 6.718008917s
Write helloworld code in Rust
```rust
fn main() {
println!("Hello, world!");
}
```
## Compile and run
```bash
$ cargo build
Compiling hello-world v0.1.0 (file:///Users/john/code/hello-world)
Finished dev [unoptimized + debuginfo] target(s) in 0.42s
$ ./target/debug/hello-world
Hello, world!
```
## Hello World with Cargo
```bash
$ cargo new hello-world --bin
Created binary (application) `hello-world` package
150 tokens generated (7.03 token/s)
You can ignore the rest of this rant. The solution is above the fold!
I really wish this was documented somewhere in the README or installation instructions. It was an all-day task down the rabbit hole to finally hit the root cause.
I went through a lot of misleading information, especially in that tinygrad link. LLVM has absolutely nothing to do with it.
First, I installed LLVM 19 with homebrew, configured my environment as documented in its caveats, no luck. I ended up removing it.
My laptop is an M3 MAX, it definitely supports Metal v3.2. But I found that I couldn't query the Metal version with xcrun metal --version. So, after installing the Xcode application from the App Store and accepting its license, I could query the Metal version after following these additional steps. Still no luck running the candle examples with --feature metal. I doubt that I need the proprietary parts of Xcode. The Command Line Tools ought to be enough. But I will keep it around for now, I guess.
I went down some other dead ends around upgrading the metal crate because the version you depend on only knows that versions up to 2.4 exist. The upgrade went smoothly, but still didn't fix the problem. After patching candle-metal-kernels to explicitly set the Metal version to v3.2, it worked!
Querying the default version through the metal crate, I was very confused about why it was v2.3. So, I went around in circles through Apple documentation, random help forum threads and GitHub issues until I happened upon an old announcement on the Rust blog. And that finally led me to the rustc documentation and the environment variable that needs to be set.
So, the big question for all of the "It Works For Me :tm:" developers out there: Have you hardcoded this environment variable on your system somewhere? Or are you building through Xcode in some way to increase the minimum deployment target? Is there some other way to set the default system-wide deployment target?
So, the big question for all of the "It Works For Me ™️" developers out there: Have you hardcoded this environment variable on your system somewhere? Or are you building through Xcode in some way to increase the minimum deployment target? Is there some other way to set the default system-wide deployment target?
I have candle running on at least 4 different macs and didn't have to hardcode the environment variable. Building is done via cargo build --features metal ....
I encountered the missing bf16 issue a while back while using a nix env on my mac but haven't run into any issues since I ditched nix for this use case.
And does the rustc command show you are using deployment target 11.0?
Just tested here on Mac M1 with
MACOSX_DEPLOYMENT_TARGET=15.3 cargo run --example stable-diffusion-3 --release --features=metal -- \
--which 3.5-medium --height 1024 --width 1024 \
--prompt 'lets go'
and gets error
`release` profile [optimized] target(s) in 0.57s
Running `target/release/examples/stable-diffusion-3 --which 3.5-medium --height 1024 --width 1024 --prompt 'lets go'`
Sampling done. 28 steps. 532.97s. Average rate: 0.05 iter/s
Error: Metal error Error while loading function: Function 'cast_bf16_f16' does not exist
Caused by:
Error while loading function: Function 'cast_bf16_f16' does not exist
@AlpineVibrations Mac got bfloat16 support from M2 onwards (metal feature set)
The only solution imo is to add a bf16 shim (wrapped u16 like the half crate) when __HAVE_BFLOAT__ is false.
We're going to be adding this kind of shim for metal fp8 soon anyway (I have functional casting [and gemm] implemented) so I could add the same for bf16 as well.