llama-node [ASK] enable cuda with manual compilation

Hi,

According to the llama.cpp github repo, it's now possible to use cuda from nvidia gpu by using cuBLAS build.

So you may got where I want to come, how to do Manual compilation with using make or cmake args to enable LLAMA_CUBLAS :)

greeting

Apr 30 '23 15:04 tchereau

Hi, I plan to support this CUDA feature in the near future. Thanks for your suggestion.

Apr 30 '23 16:04 hlhr202

this issue is pending new llama sampling logic here https://github.com/Atome-FE/llama-node/issues/36

May 02 '23 09:05 hlhr202

currently this issue is pending statically linking problem from llama.cpp cmake https://github.com/ggerganov/llama.cpp/pull/1128#issuecomment-1531661524

May 02 '23 15:05 hlhr202

I can compile with this :

let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");

but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu

and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu

So I think we should wait a little bit for an update from llama.cpp

May 03 '23 22:05 tchereau

@tchereau great work. i was stuggling for fPIC error for several hours and only solved by dynamic linking. thanks for your investigation. but its true that cublas flag doesnt accelerate the evaluation properly. we have to wait a while.

May 04 '23 00:05 hlhr202

I can compile with this :
let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");
but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu

and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu

So I think we should wait a little bit for an update from llama.cpp

Hi, I think you can compile with this, but can you run it actually? I got an error related to cudaLaunchKernel. I think there should be something abnormal with static linking. As a consequence, I m going to just provide a way for dynamic linking, but will not offer it as default prebuilt binary.

May 04 '23 09:05 hlhr202

I can compile with this :
let command = command
        .arg("..")
        .arg("-DCMAKE_BUILD_TYPE=Release")
        .arg("-DLLAMA_OPENBLAS=ON")
        .arg("-DLLAMA_CUBLAS=ON")
        .arg("-DLLAMA_SHARED_LIBS=ON")
        .arg("-DLLAMA_STATIC=ON")
        .arg("-DLLAMA_ALL_WARNINGS=OFF")
        .arg("-DLLAMA_ALL_WARNINGS_3RD_PARTY=OFF")
        .arg("-DLLAMA_BUILD_TESTS=OFF")
        .arg("-DLLAMA_BUILD_EXAMPLES=OFF")
        .arg("-DCMAKE_POSITION_INDEPENDENT_CODE=ON");
but as a result, openblas is not activated, and cublas seems not be activated too, it's don't use my gpu and worst, I've compiled llama.cpp from the github source, blas work fine, but clblast and cublas don't. it's put some data in the vram, but it's still don't use the gpu, only the cpu So I think we should wait a little bit for an update from llama.cpp
Hi, I think you can compile with this, but can you run it actually? I got an error related to cudaLaunchKernel. I think there should be something abnormal with static linking. As a consequence, I m going to just provide a way for dynamic linking, but will not offer it as default prebuilt binary.

well yes I can compile, but can't run it

pnpm build:llama-cpp

> [email protected] build:llama-cpp /root/git/llama-node
> pnpm run --filter=@llama-node/llama-cpp cross-compile


> @llama-node/[email protected] cross-compile /root/git/llama-node/packages/llama-cpp
> rimraf @llama-node && tsx scripts/cross-compile.mts

info: component 'rust-std' for target 'x86_64-unknown-linux-gnu' is up to date
info: component 'rust-std' for target 'x86_64-unknown-linux-musl' is up to date
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
/bin/sh: 1: zig: not found
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
    Blocking waiting for file lock on build directory
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 55.77s
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 1m 34s

nodejs:

node .
node:internal/modules/cjs/loader:1338
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: /root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node: undefined symbol: cublasSetMathMode
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/index.js:188:31)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12) {
  code: 'ERR_DLOPEN_FAILED'
}

Node.js v18.16.0

May 04 '23 13:05 tchereau

and with clblast:

pnpm build:llama-cpp

> [email protected] build:llama-cpp /root/git/llama-node
> pnpm run --filter=@llama-node/llama-cpp cross-compile


> @llama-node/[email protected] cross-compile /root/git/llama-node/packages/llama-cpp
> rimraf @llama-node && tsx scripts/cross-compile.mts

info: component 'rust-std' for target 'x86_64-unknown-linux-gnu' is up to date
info: component 'rust-std' for target 'x86_64-unknown-linux-musl' is up to date
/bin/sh: 1: zig: not found
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
warning: /root/git/llama-node/Cargo.toml: unused manifest key: workspace.package.name
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on package cache
    Blocking waiting for file lock on build directory
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 49.25s
   Compiling llama-sys v0.0.1 (/root/git/llama-node/packages/llama-cpp/llama-sys)
   Compiling llama-node-cpp v0.1.0 (/root/git/llama-node/packages/llama-cpp)
warning: value assigned to `id` is never read
   --> packages/llama-cpp/src/context.rs:189:17
    |
189 |         let mut id = 0;
    |                 ^^
    |
    = help: maybe it is overwritten before being read?
    = note: `#[warn(unused_assignments)]` on by default

warning: `llama-node-cpp` (lib) generated 1 warning
    Finished release [optimized] target(s) in 1m 20s

nodejs:

node .
node:internal/modules/cjs/loader:1338
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: /root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/@llama-node/llama-cpp.linux-x64-gnu.node: undefined symbol: clBuildProgram
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (/root/git/llama-selfbot/node_modules/llama-node/node_modules/@llama-node/llama-cpp/index.js:188:31)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12) {
  code: 'ERR_DLOPEN_FAILED'
}

Node.js v18.16.0

same with .arg("-DLLAMA_OPENBLAS=ON") only

undefined symbol: cblas_sgemm

in all cases, the problem seem to be library who are not linked

at this point, I can't help, because I don't have the knowledge in C/rust

May 04 '23 13:05 tchereau

oops, will reopen and leave this until static linking is ready someday... currently only provide a self built dynamic linking version.

https://github.com/Atome-FE/llama-node/pull/42/files here is the customizable build features for cargo. I will prepare another docs for this.

May 04 '23 14:05 hlhr202

https://llama-node.vercel.app/docs/cuda a manual compilation guide has been provided here

May 16 '23 15:05 hlhr202

llama-node llama-node copied to clipboard

[ASK] enable cuda with manual compilation

llama-node
llama-node copied to clipboard