LocalAI docker-compose method could not load rwkv model, in spite of proper folder structure

hello,

i tried the docker-compose method outlined in the README and here is the output: $ docker-compose up -d --build /usr/lib/python3/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated "class": algorithms.Blowfish, Building api [+] Building 1.6s (13/13) FINISHED => [internal] load .dockerignore 0.7s => => transferring context: 73B 0.6s => [internal] load build definition from Dockerfile.dev 0.7s => => transferring dockerfile: 352B 0.6s => [internal] load metadata for docker.io/library/debian:11 0.5s => [internal] load metadata for docker.io/library/golang:1.20 0.5s => [builder 1/5] FROM docker.io/library/golang:1.20@sha256:eaf12671a7ac51fd23786109c19bd0150c8f894e2672024faac3d14ed4 0.0s => [internal] load build context 0.4s => => transferring context: 5.91kB 0.3s => [stage-1 1/2] FROM docker.io/library/debian:11@sha256:63d62ae233b588d6b426b7b072d79d1306bfd02a72bff1fc045b8511cc89 0.0s => CACHED [builder 2/5] WORKDIR /build 0.0s => CACHED [builder 3/5] RUN apt-get update && apt-get install -y cmake 0.0s => CACHED [builder 4/5] COPY . . 0.0s => CACHED [builder 5/5] RUN make build 0.0s => CACHED [stage-1 2/2] COPY --from=builder /build/local-ai /usr/bin/local-ai 0.0s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:3906afc65f2d467953e4aab83a55aaf75498b858a0ebf299bc32f8eed16b4328 0.0s => => naming to quay.io/go-skynet/local-ai:latest 0.0s Starting localai_api_1 ... done

but then the rwkv models (multiple ones properly converted) do not load:

$ curl http://172.18.0.2:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "rwkv.cpp-1.5b-11x.bin", "messages": [{"role": "user", "content": "Say this is a test!"}], "temperature": 0.7 }' {"error":{"code":500,"message":"could not load model - all backends returned error: 5 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* could not load model\n\n","type":""}}

please like this issue if you have the same problem

May 03 '23 20:05 bennmann

Hi @bennmann,

Can you show the content of your models directory, and what the /models endpoint return?

May 03 '23 22:05 mudler

Mine one loaded properly, but facing the following error I get from docker-compose logs:

api_1  | llama.cpp: loading model from /models/rwkv-1b5-v11
api_1  | error loading model: unknown (magic, version) combination: 67676d66, 00000064; is this really a GGML file?
api_1  | llama_init_from_file: failed to load model
api_1  | gptj_model_load: invalid model file '/models/rwkv-1b5-v11' (bad magic)
api_1  | gptj_bootstrap: failed to load model from '/models/rwkv-1b5-v11'
api_1  | gpt2_model_load: invalid model file '/models/rwkv-1b5-v11' (bad magic)
api_1  | gpt2_bootstrap: failed to load model from '/models/rwkv-1b5-v11'
api_1  | stablelm_model_load: invalid model file '/models/rwkv-1b5-v11' (bad magic)
api_1  | stablelm_bootstrap: failed to load model from '/models/rwkv-1b5-v11'
api_1  | SIGILL: illegal instruction
api_1  | PC=0x8ea6ed m=0 sigcode=2
api_1  | signal arrived during cgo execution
api_1  | instruction bytes: 0xc4 0xe3 0x7d 0x39 0x45 0xa8 0x1 0x48 0x8b 0x43 0xe8 0x48 0x8b 0xbd 0xd8 0x7c

The model above converted using the following command:

python rwkv/convert_pytorch_to_ggml.py ~/Downloads/RWKV-4-Raven-1B5-v11-Eng99%-Other1%-20230425-ctx4096.pth ~/Downloads/rwkv-1b5-v11 float16

The following is the output of gpt4all-chat's test_hw on my workstation:

gpt4all hardware test results:
    AVX  = 1
    AVX2 = 0
    FMA  = 0
    SSE3 = 1
your hardware supports the "bare_minimum" version of gpt4all.

Did RWKV required AVX2 to execute?

May 04 '23 09:05 limcheekin

can you check latest master? I've also added an example: https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv

May 04 '23 15:05 mudler

did a git pull, then similar issues arise:

/LocalAI$ http://localhost:8080/v1/chat/chat/completions -H "Content-Type: application/json" -d '{
     "model": "rwkv.cpp-1.5b-11x.bin",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 5 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* could not load model\n\n","type":""}}

I do have a proper response for querying models:

{"object":"list","data":[{"id":"20B_tokenizer.json","object":"model"},{"id":"RWKV-14B-11x-Q5_1.bin","object":"model"},{"id":"RWKV-4-Raven-7B-v10-Eng99-20230418-ctx8192-cppfp16.bin","object":"model"},{"id":"rwkv.cpp-1.5b-11x.bin","object":"model"},{"id":"rwkv.tokenizer.json"," ```

the models DO begin to load into RAM (via free -h command i see in terminal Used space increases), but still error 500

thanks for any help on next steps.

May 04 '23 21:05 bennmann

Can you try the steps in https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv ?

May 04 '23 22:05 mudler

i ran out of disk space, and having a time getting my environment stable again.... i will report back in some time

May 04 '23 23:05 bennmann

ok, hopes this helps someone else too - containers kept stacking up in docker and ate my space away:

had to ~$ docker system prune -a -f then i was able to remove the larger models and begin another test with 1.5b size model only:

docker-compose up -d --build
....
....
 => [builder 5/5] RUN make build                                                                                                    53.3s
 => [stage-1 2/2] COPY --from=builder /build/local-ai /usr/bin/local-ai                                                              0.3s
 => exporting to image                                                                                                               0.2s
 => => exporting layers                                                                                                              0.1s
 => => writing image sha256:bc5368903e72d4a9766857d0418a3057f605d3579c597606fc984c2d28f60afd                                         0.0s
 => => naming to quay.io/go-skynet/local-ai:latest                                                                                   0.0s
Creating rwkv_api_1 ... done

but alas i still get http 500 (models end point still works) $ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "rwkv.cpp-1.5b-11x.bin", "messages": [{"role": "user", "content": "Say this is a test!"}], "temperature": 0.7 }' {"error":{"code":500,"message":"could not load model - all backends returned error: 5 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* could not load model\n\n","type":""}} also i haven't found the most graceful way to get localai to stop, i just docker kill it to get ram back $ docker kill rwkv_api_1

May 05 '23 00:05 bennmann

I had this problem when I first ran it and it turned out to be because it doesn't handle symlinks properly, once I switched from a symlink to just copying the file into the models directory it fixed it.

May 05 '23 16:05 gamerscomplete

today i tested this way, and confirmed all the folders structures were chown'd to my user

i tried the local make build install and also get an RWKV HTTP 500

May 06 '23 20:05 bennmann

Has anyone gotten past this http 500 issue?

May 13 '23 14:05 bennmann

I have the same error, can't load the model, latest main version and everything setup correctly (BUILD_TYPE=generic). I tried to run rwkv-cpp on the very same model that cannot be loaded with LocalAI, and it was working well ... I've tried updating the commit version of go-rwkv-cppp in the Makefile but didn't work either.

May 14 '23 11:05 croumegous

did you put the token file next to the rwkv model? there is an rwkv example over here: https://github.com/go-skynet/LocalAI/tree/master/examples/rwkv

May 14 '23 11:05 mudler

Yes to the token and same issue

May 14 '23 11:05 bennmann

Did you tried to run step-by-step the example? Can you report the full output logs?

To understand what's going on, I would need to know:

Version of LocalAI you are using
What is the content of your model folder, and if you had configured the model with a YAML file, please post it as well
Full output logs of the API running with --debug with your steps

May 14 '23 11:05 mudler

Version of LocalAI you are using latest: 850a690290ac32079efa1e5f779bdd082957d380 (remove and reclone the project to reproduce the example step by step )
What is the content of your model folder, and if you had configured the model with a YAML file, please post it as well

debian@ai3:~/workspace/LocalAI/examples/rwkv$ ls -al models/
total 1205536
drwxr-xr-x 2 debian debian       4096 May 14 13:49 .
drwxr-xr-x 4 debian debian       4096 May 14 13:41 ..
-rw-r--r-- 1 debian debian        296 May 14 13:41 gpt-3.5-turbo.yaml
-rw-r--r-- 1 root   root   1231971925 May 14 13:49 rwkv
-rw-r--r-- 1 debian debian    2467981 May 14 13:50 rwkv.tokenizer.json
-rw-r--r-- 1 debian debian        397 May 14 13:41 rwkv_chat.tmpl
-rw-r--r-- 1 debian debian         44 May 14 13:41 rwkv_completion.tmpl

md5sum of rwkv model : 71494609f13616d7fb8e9daa101cefd0 (used model in the example)

Full output logs of the API running with --debug with your steps

Starting LocalAI using 4 threads, with models path: /models

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.45.0                   │ 
 │               http://127.0.0.1:8080               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 17  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................. 1 │ 
 └───────────────────────────────────────────────────┘ 

1:56PM DBG Model: gpt-3.5-turbo (config: {OpenAIRequest:{Model:rwkv File: ResponseFormat: Language: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.8 TopK:80 Temperature:0.9 Maxtokens:100 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:14 Debug:false Roles:map[assistant:Alice: system:Alice: user:Bob:] Embeddings:false Backend:rwkv TemplateConfig:{Completion:rwkv_completion Chat:rwkv_chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 PromptStrings:[] InputStrings:[] InputToken:[]})
1:57PM DBG Request received: {"model":"gpt-3.5-turbo","file":"","response_format":"","language":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"stream":false,"echo":false,"top_p":0.8,"top_k":80,"temperature":0.9,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0}
1:57PM DBG Parameter Config: &{OpenAIRequest:{Model:rwkv File: ResponseFormat: Language: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.8 TopK:80 Temperature:0.9 Maxtokens:100 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:14 Debug:true Roles:map[assistant:Alice: system:Alice: user:Bob:] Embeddings:false Backend:rwkv TemplateConfig:{Completion:rwkv_completion Chat:rwkv_chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 PromptStrings:[] InputStrings:[] InputToken:[]}
1:57PM DBG Template found, input modified to: The following is a verbose detailed conversation between Bob and a woman, Alice. Alice is intelligent, friendly and likeable. Alice is likely to agree with Bob.

Bob: Hello Alice, how are you doing?

Alice: Hi Bob! Thanks, I'm fine. What about you?

Bob: I am very good! It's nice to see you. Would you mind me chatting with you for a while?

Alice: Not at all! I'm listening.

Bob: How are you?

Alice: 
1:57PM DBG Loading model in memory from file: /models/rwkv
[172.20.0.1]:49328  500  -  POST     /v1/chat/completions

PS: looking at ls -al of models I thought this could be a permission issue but after updating permissions I still have the exact same issue. The only difference with the example is taht I used BUILD_TYPE=generic

May 14 '23 14:05 croumegous

I re-tried it locally and indeed there seems a regression introduced in https://github.com/go-skynet/LocalAI/pull/234. Fix is on its way in https://github.com/go-skynet/LocalAI/pull/255 and I'll tag a patch release afterwards, thanks for the detective work! This definitely needs more love in the CI to avoid regressions in the future. Please re-open or create other issues if you have still problems with rwkv.

May 14 '23 15:05 mudler

looks like some filename variables may not be gracefully accounting for the example/rwkv folder structure now? this is closer though (using the docker-compose method)

$ docker-compose up -d --build
....
....
#0 6.402 CMake Error: The source "/build/go-bert/bert.cpp/CMakeLists.txt" does not match the source "/media/username/LocalAI/go-bert/bert.cpp/CMakeLists.txt" used to generate cache.  Re-run cmake with a different source directory.
#0 6.403 make[1]: *** [Makefile:150: bert.o] Error 1
#0 6.403 make[1]: Leaving directory '/build/go-bert'
#0 6.403 make: *** [Makefile:94: go-bert/libgobert.a] Error 2
------
Dockerfile.dev:9
--------------------
   7 |     RUN apt-get update && apt-get install -y cmake
   8 |     COPY . .
   9 | >>> RUN make build
  10 |     
  11 |     FROM debian:$DEBIAN_VERSION
--------------------
ERROR: failed to solve: process "/bin/sh -c make build" did not complete successfully: exit code: 2

May 15 '23 19:05 bennmann

LocalAI LocalAI copied to clipboard

docker-compose method could not load rwkv model, in spite of proper folder structure

LocalAI
LocalAI copied to clipboard