mistral.rs icon indicating copy to clipboard operation
mistral.rs copied to clipboard

metal phi3 --dtype bf16 "Function 'cast_f32_bf16' does not exist

Open jk2K opened this issue 1 year ago • 1 comments

Describe the bug

cargo run  --features metal --package mistralrs-server --bin mistralrs-server -- --token-source cache -i plain -m microsoft/Phi-3.5-mini-instruct -a phi3 --dtype bf16

error message

.4800033569336, 64.51000213623047, 64.52999877929688, 64.83999633789063], scaling_type: Su }), max_position_embeddings: 131072, use_flash_attn: false, sliding_window: Some(262144), original_max_position_embeddings: 4096, quantization_config: None }
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:13<00:00, 13.48it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:07<00:00, 9.84it/s]
Error: Metal error Error while loading function: "Function 'cast_f32_bf16' does not exist"

Latest commit or version

5fcc9d6f8c0159feb3a237d07e8b3eb191dc6474

jk2K avatar Sep 07 '24 11:09 jk2K

related to https://github.com/huggingface/candle/issues/2163

jk2K avatar Sep 07 '24 11:09 jk2K

Hey folks - is there a solution for this? Does it mean I can't really use mistral.rs on a mac for Llama 3.2 vision?

kinchahoy avatar Oct 28 '24 19:10 kinchahoy

@kinchahoy are you having this issue? I cannot reproduce it on my Mac - everything works.

EricLBuehler avatar Oct 28 '24 19:10 EricLBuehler

Hey Eric - thanks for taking a look. I get:

when I run /examples/python/llama_vision.py with the following changes:

MODEL_ID = "EricB/Llama-3.2-11B-Vision-Instruct-UQFF"

and

which=Which.VisionPlain(
    model_id=MODEL_ID,
    arch=VisionArchitecture.VLlama,
    from_uqff="llama3.2-vision-instruct-q4k.uqff"
),

❯ python llama_vision_v2.py 2024-10-28T19:33:51.245838Z INFO mistralrs_core::pipeline::vision: Loading tokenizer.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF2024-10-28T19:33:51.246011Z INFO mistralrs_core::pipeline::vision: Loadingconfig.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF2024-10-28T19:33:51.543602Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["residual.safetensors"] 2024-10-28T19:33:51.684169Z INFO mistralrs_core::pipeline::vision: Loadinggeneration_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF2024-10-28T19:33:51.800356Z INFO mistralrs_core::pipeline::vision: Loadingpreprocessor_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF2024-10-28T19:33:51.912937Z INFO mistralrs_core::pipeline::vision: Loadingtokenizer_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF2024-10-28T19:35:54.736120Z INFO mistralrs_core::pipeline::vision: Loading modelEricB/Llama-3.2-11B-Vision-Instruct-UQFFon metal[4294968663]. 2024-10-28T19:35:54.736198Z INFO mistralrs_core::pipeline::vision: Model config: MLlamaConfig { vision_config: MLlamaVisionConfig { hidden_size: 1280, hidden_act: Gelu, num_hidden_layers: 32, num_global_layers: 8, num_attention_heads: 16, num_channels: 3, intermediate_size: 5120, vision_output_dim: 7680, image_size: 560, patch_size: 14, norm_eps: 1e-5, max_num_tiles: 4, intermediate_layers_indices: [3, 7, 15, 23, 30], supported_aspect_ratios: [(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (3, 1), (4, 1)] }, text_config: MLlamaTextConfig { rope_scaling: Some(MLlamaRopeScaling { rope_type: Llama3, factor: Some(8.0), original_max_position_embeddings: 8192, attention_factor: None, beta_fast: None, beta_slow: None, short_factor: None, long_factor: None, low_freq_factor: Some(1.0), high_freq_factor: Some(4.0) }), vocab_size: 128256, hidden_size: 4096, hidden_act: Silu, num_hidden_layers: 40, num_attention_heads: 32, num_key_value_heads: 8, intermediate_size: 14336, rope_theta: 500000.0, rms_norm_eps: 1e-5, max_position_embeddings: 131072, tie_word_embeddings: false, cross_attention_layers: [3, 8, 13, 18, 23, 28, 33, 38], use_flash_attn: false, quantization_config: None } } 2024-10-28T19:35:54.745491Z INFO mistralrs_core::utils::normal: DType selected is F16. Traceback (most recent call last): File "/Users/raistlin/mistral.rs/examples/python/llama_vision_v2.py", line 7, in <module> runner = Runner( ^^^^^^^ ValueError: Metal error Error while loading function: "Function 'cast_bf16_f16' does not exist"

kinchahoy avatar Oct 28 '24 19:10 kinchahoy

@kinchahoy could you please let me know what your hardware (chip, memory, etc) is?

EricLBuehler avatar Oct 28 '24 19:10 EricLBuehler

OS: macOS Sequoia 15.1 arm64 Host: MacBook Air (M2, 2022) Kernel: Darwin 24.1.0 Display (Color LCD): 3420x2224 @ 60 Hz (as 1710x1112) in 14" [Built-in] CPU: Apple M2 (8) @ 3.50 GHz GPU: Apple M2 (10) @ 1.40 GHz [Integrated] Memory: 9.60 GiB / 16.00 GiB (60%) Swap: Disabled Disk (/): 255.10 GiB / 926.35 GiB (28%) - apfs [Read-only]

Thanks again for taking a look at this Eric!

kinchahoy avatar Oct 28 '24 20:10 kinchahoy

hey @jk2K and @kinchahoy

I ran across this issue during the weekend. I'm not sure what helped me but I installed rust from home-brew, updated and upgraded home-brew packages (this moved my python version to the latest 3.11 micro version) . Updated the command line tools (to get the latest gcc version) and upgraded macOs to the latest version available.

Unfortunately, I couldn't run Llama 3.2 vision as it is using a lot of ram and the inference is via CPU (super slow in a low spec M2). I was looking at the activity monitor and seems like the GPU stats never change, I'm not sure if I'm mistaken in this one but I thought metal uses the graphic cores.

Anyways hope this help you to test by yourself.

Regards.

Julio0250 avatar Nov 25 '24 00:11 Julio0250

@Julio0250 could you please let me know what command(s) you used to run Llama 3.2 vision? Perhaps you did not build with Metal support?

EricLBuehler avatar Nov 25 '24 01:11 EricLBuehler

I installed mistral.rs with the python package. pip install mistralrs-metal Then I ran the example here

I would like to add that I cloned this repository but didn't do anything else rather than copy-paste the example in another script at the root of my folder. I'm not sure if the clone of the repository was needed.

Also I do believe the pip package in the backend just call the rust implementation of this but again, not sure as this is my first time trying to run big models locally.

This is all that I have installed in my virtualenv.

certifi==2024.8.30 charset-normalizer==3.4.0 filelock==3.16.1 fsspec==2024.10.0 huggingface-hub==0.26.2 idna==3.10 mistralrs-metal==0.3.2 packaging==24.2 PyYAML==6.0.2 requests==2.32.3 tqdm==4.67.1 typing_extensions==4.12.2 urllib3==2.2.3

Thanks for looking at this @EricLBuehler!

Julio0250 avatar Nov 25 '24 05:11 Julio0250