llama.cpp
llama.cpp copied to clipboard
Eval bug: Does the llamacpp not support gfx1103(amd 780m iGPU)
Name and Version
llama-b4730-bin-win-hip-x64-gfx1030
Operating systems
Windows
GGML backends
HIP
Hardware
Ryzen 7 8840u
Models
any model
Problem description & steps to reproduce
when i use the cmd .\llama-cli -m C:\Users\LingLong6.lmstudio\models\linglong\qwen7\qwen.gguf --n-gpu-layers 100 it will not work. if i dont use the "--n-gpu-layers 100",it will be ok.
First Bad Commit
No response
Relevant log output
load_tensors: ROCm0 model buffer size = 4168.09 MiB
load_tensors: CPU_Mapped model buffer size = 292.36 MiB
.................................................................................
llama_init_from_model: n_seq_max = 1
llama_init_from_model: n_ctx = 4096
llama_init_from_model: n_ctx_per_seq = 4096
llama_init_from_model: n_batch = 2048
llama_init_from_model: n_ubatch = 512
llama_init_from_model: flash_attn = 0
llama_init_from_model: freq_base = 1000000.0
llama_init_from_model: freq_scale = 1
llama_init_from_model: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 28, can_shift = 1
llama_kv_cache_init: ROCm0 KV buffer size = 224.00 MiB
llama_init_from_model: KV self size = 224.00 MiB, K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_init_from_model: ROCm_Host output buffer size = 0.58 MiB
llama_init_from_model: ROCm0 compute buffer size = 304.00 MiB
llama_init_from_model: ROCm_Host compute buffer size = 15.01 MiB
llama_init_from_model: graph nodes = 986
llama_init_from_model: graph splits = 2
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
D:/a/llama.cpp/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:73: ROCm error
ggml_cuda_compute_forward: RMS_NORM failed
ROCm error: invalid device function