edgen
edgen copied to clipboard
how do I build edgen locally in Mac
What is a correct way to build edgen locally in Mac with metal?
git clone https://github.com/edgenai/edgen.git
cd edgen/edgen
npm run tauri build
This seems to always crash with segfault with or without llama_meta feature. It used to work before but has been failing recently.
cargo run --release --features llama_metal -- serve
Compiling edgen v0.1.3 (/Users/username/code/tmp/edgen/edgen/src-tauri)
Finished release [optimized] target(s) in 3.10s
Running `/Users/username/code/tmp/edgen/target/release/edgen serve`
Segmentation fault: 11
curl http://localhost:33322/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer no-key-required" -d '{
"model": "default",
"messages": [
{
"role": "system",
"content": "You are EdgenChat, a helpful AI assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
I'm using default config and have reset it too.
@prabirshrestha when the build fails, does it out a specific error message
Build works. But running the server fails with the error I mentioned. That is the only error I see even with RUST_BACKTRACE=1
@prabirshrestha this "Segmentation fault: 11" ?
yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.
Now I'm getting this error.
Finished dev [unoptimized + debuginfo] target(s) in 8.58s
Running `target/debug/edgen serve`
Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
Abort trap: 6
yes. Seems like using the official release version in Mac also seems to fail now. Probably some changes in master that is causing the issue.
Most likely, I'll inspect the CI build, might be a system deps or something
Here are the new logs 45f2a7d7034621832891518b13a5855948c89771
/Users/prabirshrestha/code/tmp/edgen$ cargo run --release
Compiling edgen v0.1.5 (/Users/prabirshrestha/code/tmp/edgen/edgen/src-tauri)
Finished release [optimized] target(s) in 2.99s
Running `target/release/edgen`
2024-03-27T02:34:21.218710Z INFO edgen_core::settings: Loading existing settings file: /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/edgen.conf.yaml
2024-03-27T02:34:21.221257Z INFO edgen_server: Using default URI
2024-03-27T02:34:21.221333Z INFO edgen_server: Listening in on: http://127.0.0.1:33322 2024-03-27T02:34:33.235666Z INFO edgen_server::model: Loading existing model patterns file
2024-03-27T02:34:33.235867Z INFO hf_hub: Token file not found "/Users/prabirshrestha/.cache/huggingface/token"
2024-03-27T02:34:33.236960Z INFO edgen_server::status: progress observer: no download necessary, file is already there
2024-03-27T02:34:33.237134Z INFO edgen_core::perishable: (Re)Creating a new llama_cpp::model::LlamaModel
2024-03-27T02:34:33.237180Z INFO edgen_rt_llama_cpp: Loading /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf into memory
2024-03-27T02:34:33.238119Z INFO llama_cpp::model: Loading model "/Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf"
2024-03-27T02:34:33.242906Z INFO llama.cpp: llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /Users/prabirshrestha/Library/Application Support/com.EdgenAI.Edgen/models/chat/completions/models--TheBloke--neural-chat-7B-v3-3-GGUF/snapshots/5a354dacb2b2e2014cd239755920b2362be64d13/neural-chat-7b-v3-3.Q4_K_M.gguf (version GGUF V3 (latest))
2024-03-27T02:34:33.242920Z INFO llama.cpp: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2024-03-27T02:34:33.242926Z INFO llama.cpp: llama_model_loader: - kv 0: general.architecture str = llama
2024-03-27T02:34:33.242929Z INFO llama.cpp: llama_model_loader: - kv 1: general.name str = intel_neural-chat-7b-v3-3
2024-03-27T02:34:33.242932Z INFO llama.cpp: llama_model_loader: - kv 2: llama.context_length u32 = 32768
2024-03-27T02:34:33.242934Z INFO llama.cpp: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
2024-03-27T02:34:33.242936Z INFO llama.cpp: llama_model_loader: - kv 4: llama.block_count u32 = 32
2024-03-27T02:34:33.242939Z INFO llama.cpp: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
2024-03-27T02:34:33.242941Z INFO llama.cpp: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
2024-03-27T02:34:33.242943Z INFO llama.cpp: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
2024-03-27T02:34:33.242946Z INFO llama.cpp: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
2024-03-27T02:34:33.242950Z INFO llama.cpp: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
2024-03-27T02:34:33.242954Z INFO llama.cpp: llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
2024-03-27T02:34:33.242956Z INFO llama.cpp: llama_model_loader: - kv 11: general.file_type u32 = 15
2024-03-27T02:34:33.242958Z INFO llama.cpp: llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
2024-03-27T02:34:33.247335Z INFO llama.cpp: llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
2024-03-27T02:34:33.255357Z INFO llama.cpp: llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
2024-03-27T02:34:33.256454Z INFO llama.cpp: llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
2024-03-27T02:34:33.256457Z INFO llama.cpp: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
2024-03-27T02:34:33.256459Z INFO llama.cpp: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
2024-03-27T02:34:33.256461Z INFO llama.cpp: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
2024-03-27T02:34:33.256462Z INFO llama.cpp: llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
2024-03-27T02:34:33.256464Z INFO llama.cpp: llama_model_loader: - kv 20: general.quantization_version u32 = 2
2024-03-27T02:34:33.256466Z INFO llama.cpp: llama_model_loader: - type f32: 65 tensors
2024-03-27T02:34:33.256468Z INFO llama.cpp: llama_model_loader: - type q4_K: 193 tensors
2024-03-27T02:34:33.256470Z INFO llama.cpp: llama_model_loader: - type q6_K: 33 tensors
2024-03-27T02:34:33.266441Z INFO llama.cpp: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
2024-03-27T02:34:33.266445Z INFO llama.cpp: llm_load_print_meta: format = GGUF V3 (latest)
2024-03-27T02:34:33.266447Z INFO llama.cpp: llm_load_print_meta: arch = llama
2024-03-27T02:34:33.266448Z INFO llama.cpp: llm_load_print_meta: vocab type = SPM
2024-03-27T02:34:33.266450Z INFO llama.cpp: llm_load_print_meta: n_vocab = 32000
2024-03-27T02:34:33.266451Z INFO llama.cpp: llm_load_print_meta: n_merges = 0
2024-03-27T02:34:33.266453Z INFO llama.cpp: llm_load_print_meta: n_ctx_train = 32768
2024-03-27T02:34:33.266454Z INFO llama.cpp: llm_load_print_meta: n_embd = 4096
2024-03-27T02:34:33.266456Z INFO llama.cpp: llm_load_print_meta: n_head = 32
2024-03-27T02:34:33.266458Z INFO llama.cpp: llm_load_print_meta: n_head_kv = 8
2024-03-27T02:34:33.266459Z INFO llama.cpp: llm_load_print_meta: n_layer = 32
2024-03-27T02:34:33.266460Z INFO llama.cpp: llm_load_print_meta: n_rot = 128
2024-03-27T02:34:33.266462Z INFO llama.cpp: llm_load_print_meta: n_embd_head_k = 128
2024-03-27T02:34:33.266463Z INFO llama.cpp: llm_load_print_meta: n_embd_head_v = 128
2024-03-27T02:34:33.266465Z INFO llama.cpp: llm_load_print_meta: n_gqa = 4
2024-03-27T02:34:33.266466Z INFO llama.cpp: llm_load_print_meta: n_embd_k_gqa = 1024
2024-03-27T02:34:33.266468Z INFO llama.cpp: llm_load_print_meta: n_embd_v_gqa = 1024
2024-03-27T02:34:33.266469Z INFO llama.cpp: llm_load_print_meta: f_norm_eps = 0.0e+00
2024-03-27T02:34:33.266471Z INFO llama.cpp: llm_load_print_meta: f_norm_rms_eps = 1.0e-05
2024-03-27T02:34:33.266473Z INFO llama.cpp: llm_load_print_meta: f_clamp_kqv = 0.0e+00
2024-03-27T02:34:33.266474Z INFO llama.cpp: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
2024-03-27T02:34:33.266475Z INFO llama.cpp: llm_load_print_meta: f_logit_scale = 0.0e+00
2024-03-27T02:34:33.266477Z INFO llama.cpp: llm_load_print_meta: n_ff = 14336
2024-03-27T02:34:33.266478Z INFO llama.cpp: llm_load_print_meta: n_expert = 0
2024-03-27T02:34:33.266480Z INFO llama.cpp: llm_load_print_meta: n_expert_used = 0
2024-03-27T02:34:33.266481Z INFO llama.cpp: llm_load_print_meta: causal attn = 1
2024-03-27T02:34:33.266483Z INFO llama.cpp: llm_load_print_meta: pooling type = 0
2024-03-27T02:34:33.266484Z INFO llama.cpp: llm_load_print_meta: rope type = 0
2024-03-27T02:34:33.266485Z INFO llama.cpp: llm_load_print_meta: rope scaling = linear
2024-03-27T02:34:33.266487Z INFO llama.cpp: llm_load_print_meta: freq_base_train = 10000.0
2024-03-27T02:34:33.266489Z INFO llama.cpp: llm_load_print_meta: freq_scale_train = 1
2024-03-27T02:34:33.266490Z INFO llama.cpp: llm_load_print_meta: n_yarn_orig_ctx = 32768
2024-03-27T02:34:33.266492Z INFO llama.cpp: llm_load_print_meta: rope_finetuned = unknown
2024-03-27T02:34:33.266493Z INFO llama.cpp: llm_load_print_meta: ssm_d_conv = 0
2024-03-27T02:34:33.266495Z INFO llama.cpp: llm_load_print_meta: ssm_d_inner = 0
2024-03-27T02:34:33.266496Z INFO llama.cpp: llm_load_print_meta: ssm_d_state = 0
2024-03-27T02:34:33.266497Z INFO llama.cpp: llm_load_print_meta: ssm_dt_rank = 0
2024-03-27T02:34:33.266499Z INFO llama.cpp: llm_load_print_meta: model type = 7B
2024-03-27T02:34:33.266521Z INFO llama.cpp: llm_load_print_meta: model ftype = Q4_K - Medium
2024-03-27T02:34:33.266523Z INFO llama.cpp: llm_load_print_meta: model params = 7.24 B
2024-03-27T02:34:33.266525Z INFO llama.cpp: llm_load_print_meta: model size = 4.07 GiB (4.83 BPW)
2024-03-27T02:34:33.266526Z INFO llama.cpp: llm_load_print_meta: general.name = intel_neural-chat-7b-v3-3
2024-03-27T02:34:33.266528Z INFO llama.cpp: llm_load_print_meta: BOS token = 1 '<s>'
2024-03-27T02:34:33.266529Z INFO llama.cpp: llm_load_print_meta: EOS token = 2 '</s>'
2024-03-27T02:34:33.266531Z INFO llama.cpp: llm_load_print_meta: UNK token = 0 '<unk>'
2024-03-27T02:34:33.266533Z INFO llama.cpp: llm_load_print_meta: PAD token = 0 '<unk>'
2024-03-27T02:34:33.266534Z INFO llama.cpp: llm_load_print_meta: LF token = 13 '<0x0A>'
2024-03-27T02:34:33.266550Z INFO llama.cpp: llm_load_tensors: ggml ctx size = 0.11 MiB
2024-03-27T02:34:33.267130Z INFO llama.cpp: llm_load_tensors: CPU buffer size = 4165.37 MiB
2024-03-27T02:34:33.267526Z WARN llama_cpp::model: Could not find metadata key="%s.attention.key_length"
2024-03-27T02:34:33.267530Z WARN llama_cpp::model: Could not find metadata key="%s.attention.value_length"
2024-03-27T02:34:33.267533Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.conv_kernel"
2024-03-27T02:34:33.267535Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.inner_size"
2024-03-27T02:34:33.267536Z WARN llama_cpp::model: Could not find metadata key="%s.ssm.state_size"
2024-03-27T02:34:33.267556Z INFO edgen_rt_llama_cpp: No matching session found, creating new one
2024-03-27T02:34:33.267567Z INFO edgen_core::perishable: (Re)Creating a new llama_cpp::session::LlamaSession
2024-03-27T02:34:33.267569Z INFO edgen_rt_llama_cpp: Allocating new LLM session
2024-03-27T02:34:33.267581Z INFO llama.cpp: llama_new_context_with_model: n_ctx = 4096
2024-03-27T02:34:33.267584Z INFO llama.cpp: llama_new_context_with_model: n_batch = 2048
2024-03-27T02:34:33.267585Z INFO llama.cpp: llama_new_context_with_model: n_ubatch = 512
2024-03-27T02:34:33.267587Z INFO llama.cpp: llama_new_context_with_model: freq_base = 10000.0
2024-03-27T02:34:33.267589Z INFO llama.cpp: llama_new_context_with_model: freq_scale = 1
2024-03-27T02:34:33.304810Z INFO llama.cpp: llama_kv_cache_init: CPU KV buffer size = 512.00 MiB
2024-03-27T02:34:33.304822Z INFO llama.cpp: llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB
2024-03-27T02:34:33.321556Z INFO llama.cpp: llama_new_context_with_model: CPU output buffer size = 250.00 MiB
GGML_ASSERT: /Users/prabirshrestha/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/ggml.c:4906: b->type == GGML_TYPE_I32
Abort trap: 6
Would you like to share your environment, OS version, Rust and Nodejs tool chain version and all
I'll build this on my Mac and see where we stand
@francis2tm see this
@prabirshrestha I tried building it on a Mac, I think there might be some missing system deps:
I make a fork and added some README. https://github.com/opeolluwa/edgen/tree/main/edgen Follow the instructions, let's see where we go from there
Looking for "nm" or an equivalent tool
NM_PATH not set, looking for ["nm", "llvm-nm"] in PATH
Valid tool found:
llvm-nm, compatible with GNU nm
Apple LLVM version 14.0.3 (clang-1403.0.22.14.1)
Optimized build.
Default target: arm64-apple-darwin22.6.0
Host CPU: apple-m1
cargo:rerun-if-env-changed=OBJCOPY_PATH
Looking for "objcopy" or an equivalent tool..
OBJCOPY_PATH not set, looking for ["llvm-objcopy"] in PATH
--- stderr
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_ASM_COMPILER
CMAKE_ASM_FLAGS
make: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
/Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1026:75: warning: unused parameter 'params' [-Wunused-parameter]
static ggml_backend_t whisper_backend_init(const whisper_context_params & params) {
^
/Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:1620:27: warning: unused parameter 'mel_offset' [-Wunused-parameter]
const int mel_offset) {
^
/Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/thirdparty/whisper.cpp/whisper.cpp:202:29: warning: unused function 'ggml_mul_mat_pad' [-Wunused-function]
static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
^
3 warnings generated.
thread 'main' panicked at /Users/USER/.cargo/git/checkouts/whisper_cpp-rs-bf1f9509542b2c2d/fa76538/crates/whisper_cpp_sys/build.rs:295:9:
No suitable tool equivalent to "objcopy" has been found in PATH, if one is already installed, either add its directory to PATH or set OBJCOPY_PATH to its full path. For your Operating System we recommend:
"llvm-objcopy" from LLVM 17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
Error failed to build app: failed to build app
You can also checkout https://docs.edgen.co
I can build but when I run completion api it crashes.
Assertion failed: (ne % ggml_blck_size(type) == 0), function ggml_row_size, file ggml.c, line 2126.
ELIFECYCLE Command failed.
Seems like I was able to run v0.1.2 but started to crash since v0.1.3.
That's after the instructions above, correct?
yes that is after the instructions. Uninstall all rust toolchains too. Also probably wroth adding this line to the doc too in case you have multiple toolchains.
rustup override set beta-2023-11-21
One thing I did was add this to my profile after brew install llvm
to get over that error.
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
If I remove it I will get the same error as you do.
Lemme get this, the application now builds, that's after you've installed llvm and removed the existing rust tool chain
The application builds once I run the following commands. Rust toolchains didn't have much impact as I was able to build and run for other toolchains too. Just to be sure I removed all the rust toolchains and only had beta-2023-11-21
.
brew install llvm
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
I'm also able to run the edgen app and can see it in the taskbar and the window open. But as soon as I make the http://localhost:33322/v1/chat/completions
request it crashes.
Ok good! 👍 We're making some progress. Let's pickup again tomorrow, It's midnight my time