LocalAI
LocalAI copied to clipboard
Errors loading models cublas
Hello
Since latest changes in model quantization in llama.cpp, I am not able to load any model in GPU memory.
When trying to load an older quantization model like vicuna-7b-1.1.ggmlv3.q4_0.bin the logged output is:
localai-api-1 | 12:28AM DBG Loading model in memory from file: /models/vicuna-7b-1.1.ggmlv3.q4_0.bin
localai-api-1 | llama.cpp: loading model from /models/vicuna-7b-1.1.ggmlv3.q4_0.bin
localai-api-1 | llama_model_load_internal: format = ggjt v3 (latest)
localai-api-1 | llama_model_load_internal: n_vocab = 32000
localai-api-1 | llama_model_load_internal: n_ctx = 1024
localai-api-1 | llama_model_load_internal: n_embd = 4096
localai-api-1 | llama_model_load_internal: n_mult = 256
localai-api-1 | llama_model_load_internal: n_head = 32
localai-api-1 | llama_model_load_internal: n_layer = 32
localai-api-1 | llama_model_load_internal: n_rot = 128
localai-api-1 | llama_model_load_internal: ftype = 2 (mostly Q4_0)
localai-api-1 | llama_model_load_internal: n_ff = 11008
localai-api-1 | llama_model_load_internal: n_parts = 1
localai-api-1 | llama_model_load_internal: model size = 7B
localai-api-1 | llama_model_load_internal: ggml ctx size = 3615.71 MB
localai-api-1 | WARNING: failed to allocate 3615.71 MB of pinned memory: out of memory
localai-api-1 | warning: failed to mlock 3791351808-byte buffer (after previously locking 0 bytes): Cannot allocate memory
localai-api-1 | Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
localai-api-1 | ggml_init_cublas: found 1 CUDA devices:
localai-api-1 | Device 0: NVIDIA GeForce RTX 3070
localai-api-1 | llama_model_load_internal: using CUDA for GPU acceleration
localai-api-1 | llama_model_load_internal: mem required = 5407.71 MB (+ 17592185865892.00 MB per state)
localai-api-1 | llama_model_load_internal: offloading 0 layers to GPU
localai-api-1 | llama_model_load_internal: total VRAM used: 0 MB
localai-api-1 | fatal error: unexpected signal during runtime execution
localai-api-1 | [signal SIGSEGV: segmentation violation code=0x1 addr=0x1880014b161 pc=0x1880014b161]
localai-api-1 |
localai-api-1 | runtime stack:
localai-api-1 | runtime.throw({0x1387f22?, 0x1a?})
localai-api-1 | /usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7ffe8453fef8 sp=0x7ffe8453fec8 pc=0x4ace7d
localai-api-1 | runtime.sigpanic()
localai-api-1 | /usr/local/go/src/runtime/signal_unix.go:825 +0x3e9 fp=0x7ffe8453ff58 sp=0x7ffe8453fef8 pc=0x4c3329
localai-api-1 |
localai-api-1 | goroutine 22 [syscall]:
localai-api-1 | runtime.cgocall(0x9b7da0, 0xc0000b0350)
localai-api-1 | /usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc0000b0328 sp=0xc0000b02f0 pc=0x47bbdc
localai-api-1 | github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x439d1a0, 0x400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x44291c0, ...)
localai-api-1 | _cgo_gotypes.go:233 +0x4d fp=0xc0000b0350 sp=0xc0000b0328 pc=0x9026ad
localai-api-1 | github.com/go-skynet/go-llama%2ecpp.New({0xc0000c20c0, 0x25}, {0xc000096140, 0x4, 0x1?})
localai-api-1 | /build/go-llama/llama.go:26 +0x236 fp=0xc0000b0450 sp=0xc0000b0350 pc=0x902d36
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.llamaLM.func1({0xc0000c20c0?, 0x138442d?})
localai-api-1 | /build/pkg/model/initializers.go:116 +0x2a fp=0xc0000b0488 sp=0xc0000b0450 pc=0x90884a
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.(*ModelLoader).LoadModel(0xc0001d4a50, {0xc0005441e0, 0x1d}, 0xc000096180)
localai-api-1 | /build/pkg/model/loader.go:127 +0x1fe fp=0xc0000b0580 sp=0xc0000b0488 pc=0x90a77e
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.(*ModelLoader).BackendLoader(0xc0001d4a50, {0xc00002d407, 0x5}, {0xc0005441e0, 0x1d}, {0xc000096140, 0x4, 0x4}, 0x6)
localai-api-1 | /build/pkg/model/initializers.go:142 +0x34d fp=0xc0000b0620 sp=0xc0000b0580 pc=0x908ecd
localai-api-1 | github.com/go-skynet/LocalAI/api.ModelInference({_, _}, _, {{{0xc0005441e0, 0x1d}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...)
localai-api-1 | /build/api/prediction.go:254 +0x17d fp=0xc0000b0918 sp=0xc0000b0620 pc=0x983fdd
localai-api-1 | github.com/go-skynet/LocalAI/api.ComputeChoices({0xc0000c2090, 0x29}, 0xc0000a8000, 0xc000246b00, 0xc000606850?, 0x17dda08, 0x6?)
localai-api-1 | /build/api/prediction.go:592 +0x138 fp=0xc0000b11b8 sp=0xc0000b0918 pc=0x987f58
localai-api-1 | github.com/go-skynet/LocalAI/api.completionEndpoint.func2(0xc000246580)
localai-api-1 | /build/api/openai.go:252 +0x95a fp=0xc0000b1348 sp=0xc0000b11b8 pc=0x97cd5a
localai-api-1 | github.com/gofiber/fiber/v2.(*App).next(0xc00013b200, 0xc000246580)
localai-api-1 | /go/pkg/mod/github.com/gofiber/fiber/[email protected]/router.go:144 +0x1bf fp=0xc0000b13f0 sp=0xc0000b1348 pc=0x8c8cff
Meanwhile, with one of the newer quantization models like vicuna-7b-1.1.ggmlv3.q4_K_S.bin:
localai-api-1 | 12:29AM DBG Loading model in memory from file: /models/vicuna-7b-1.1.ggmlv3.q4_K_S.bin
localai-api-1 | llama.cpp: loading model from /models/vicuna-7b-1.1.ggmlv3.q4_K_S.bin
localai-api-1 | fatal error: unexpected signal during runtime execution
localai-api-1 | [signal SIGFPE: floating-point exception code=0x1 addr=0xa748fe pc=0xa748fe]
localai-api-1 |
localai-api-1 | runtime stack:
localai-api-1 | runtime.throw({0x1387f22?, 0x78cdd4f88c235e00?})
localai-api-1 | /usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7ffc98bba340 sp=0x7ffc98bba310 pc=0x4ace7d
localai-api-1 | runtime.sigpanic()
localai-api-1 | /usr/local/go/src/runtime/signal_unix.go:825 +0x3e9 fp=0x7ffc98bba3a0 sp=0x7ffc98bba340 pc=0x4c3329
localai-api-1 |
localai-api-1 | goroutine 21 [syscall]:
localai-api-1 | runtime.cgocall(0x9b7da0, 0xc0003fc350)
localai-api-1 | /usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc0003fc328 sp=0xc0003fc2f0 pc=0x47bbdc
localai-api-1 | github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x35c61a0, 0x400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x36521c0, ...)
localai-api-1 | _cgo_gotypes.go:233 +0x4d fp=0xc0003fc350 sp=0xc0003fc328 pc=0x9026ad
localai-api-1 | github.com/go-skynet/go-llama%2ecpp.New({0xc0000d01b0, 0x27}, {0xc000024160, 0x4, 0x1?})
localai-api-1 | /build/go-llama/llama.go:26 +0x236 fp=0xc0003fc450 sp=0xc0003fc350 pc=0x902d36
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.llamaLM.func1({0xc0000d01b0?, 0x138442d?})
localai-api-1 | /build/pkg/model/initializers.go:116 +0x2a fp=0xc0003fc488 sp=0xc0003fc450 pc=0x90884a
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.(*ModelLoader).LoadModel(0xc00017ea20, {0xc000176440, 0x1f}, 0xc0000241a0)
localai-api-1 | /build/pkg/model/loader.go:127 +0x1fe fp=0xc0003fc580 sp=0xc0003fc488 pc=0x90a77e
localai-api-1 | github.com/go-skynet/LocalAI/pkg/model.(*ModelLoader).BackendLoader(0xc00017ea20, {0xc00002cbb7, 0x5}, {0xc000176440, 0x1f}, {0xc000024160, 0x4, 0x4}, 0x6)
localai-api-1 | /build/pkg/model/initializers.go:142 +0x34d fp=0xc0003fc620 sp=0xc0003fc580 pc=0x908ecd
localai-api-1 | github.com/go-skynet/LocalAI/api.ModelInference({_, _}, _, {{{0xc000176440, 0x1f}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...)
localai-api-1 | /build/api/prediction.go:254 +0x17d fp=0xc0003fc918 sp=0xc0003fc620 pc=0x983fdd
localai-api-1 | github.com/go-skynet/LocalAI/api.ComputeChoices({0xc0000d0180, 0x29}, 0xc00009a280, 0xc0000dcb00, 0xc0001dc920?, 0x17dda08, 0x6?)
localai-api-1 | /build/api/prediction.go:592 +0x138 fp=0xc0003fd1b8 sp=0xc0003fc918 pc=0x987f58
localai-api-1 | github.com/go-skynet/LocalAI/api.completionEndpoint.func2(0xc0000dc580)
localai-api-1 | /build/api/openai.go:252 +0x95a fp=0xc0003fd348 sp=0xc0003fd1b8 pc=0x97cd5a
I think the first is a normal behaviour, but the second one it is not.
Thanks :)
Some here. I don't think either is normal. Both were working yesterday.
Can someone TAL?
Until it is fixed, I recommend using this docker image which has inside go-llama.cpp
quay.io/go-skynet/local-ai:sha-49a2b30-cublas-cuda12
It is an older one (a few days ago) but functional with models like q4_0
Can you reproduce with llama.cpp? Or this is only binding related?
moving to LocalAI for more visibility