alpaca.cpp icon indicating copy to clipboard operation
alpaca.cpp copied to clipboard

chat.exe will instantly exit with no text or error msg

Open yigalnavon opened this issue 1 year ago • 14 comments

chat.exe will produce blank line with no text and will exit. On windows 10 compiled with cmake

Please help.

yigalnavon avatar Mar 21 '23 00:03 yigalnavon

D:\ALPACA\alpaca-win>chat.exe
main: seed = 1679458006
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.34 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



D:\ALPACA\alpaca-win>

Same problem. I downloaded compiled exe from Releases. It shows blank line after model loading and exit. I also tried run same exe file on another PC(with newer CPU, but same OS win10 ) - there is no such error.

technoqz avatar Mar 22 '23 04:03 technoqz

I have same problem

F:\lama\alpaca-win>.\chat.exe -m ggml-alpaca-7b-q4.bin
main: seed = 1679476791
llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.34 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



F:\lama\alpaca-win>.\chat.exe -m ggml-alpaca-7b-q4.bin

andrenaP avatar Mar 22 '23 09:03 andrenaP

Mine quit after accept my question:

PS E:\repo\langchain-alpaca\dist\binary>  ./chat.exe --model "e:\repo\langchain-alpaca\model\ggml-alpaca-7b-q4.bin"  --threads 6
main: seed = 1679490011
llama_model_load: loading model from 'e:\repo\langchain-alpaca\model\ggml-alpaca-7b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.34 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'e:\repo\langchain-alpaca\model\ggml-alpaca-7b-q4.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


== Running in chat mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMA.
 - If you want to submit another line, end your input in '\'.

> Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. harrison went to harvard ankush went to princeton Question Where did harrison go to college Helpful Answer

I have enough memory for it, not a OOM.

linonetwo avatar Mar 22 '23 13:03 linonetwo

I have exactly the same issue. Compiled with cmake and VS2019. I get the info about the parameters. Waits like 10 seconds and then it is over without producing any error

aofalcao avatar Mar 22 '23 13:03 aofalcao

I have exactly the same issue. Compiled with cmake and VS2019. I get the info about the parameters. Waits like 10 seconds and then it is over without producing any error

I guess I found part of the problem, not necessarily the real cause, and not even close to a solution, but this may help at least the maintainers On function llama_eval this call:

ggml_graph_compute (ctx0, &gf);

is the one that never finishes and the program aborts

aofalcao avatar Mar 22 '23 14:03 aofalcao

Same issue, with even less text than the others.

E:\AI-Chat\alpaca-win>chat -m ggml-alpaca-13b-q4.bin
main: seed = 1679516224
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 10959.49 MB

E:\AI-Chat\alpaca-win>

clicking on chat.exe will not load anything.

rangedreign avatar Mar 22 '23 20:03 rangedreign

Me too. Windows10, 32gb ram. "chat.exe has stopped working". Same for 7b and 13b

I don't even get to ask it a question

fancellu avatar Mar 22 '23 21:03 fancellu

Same here

D:\StableDiffusion\Alpaca>chat.exe -i -m ggml-alpaca-13b-q4.bin -t 1
main: seed = 1679581051
llama_model_load: loading model from 'ggml-alpaca-13b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 10959.49 MB
llama_model_load: memory_size =  3200.00 MB, n_mem = 81920
llama_model_load: loading model part 1/1 from 'ggml-alpaca-13b-q4.bin'
llama_model_load: ............................................. done
llama_model_load: model size =  7759.39 MB / num tensors = 363

system_info: n_threads = 1 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

16 GB RAM of which about 12 GB are being used during "startup"

StanDaMan0505 avatar Mar 23 '23 14:03 StanDaMan0505

Building with MINGW might help

  1. Install MSYS2 (www.msys2.org)

  2. Place sources into C:\msys64\home\USERNAME\alpaca.cpp and apply this patch https://github.com/antimatter15/alpaca.cpp/pull/84

  3. Inside UCRT64 terminal run: pacman -S mingw-w64-ucrt-x86_64-gcc pacman -S make cd alpaca.cpp make

chat.exe should appear there C:\msys64\home\USERNAME\alpaca.cpp\chat.exe

SiemensSchuckert avatar Mar 23 '23 16:03 SiemensSchuckert

Building with MINGW might help

  1. Install MSYS2 (www.msys2.org)
  2. Place sources into C:\msys64\home\USERNAME\alpaca.cpp and apply this patch Add support for building on native Windows via MINGW. #84
  3. Inside UCRT64 terminal run: pacman -S mingw-w64-ucrt-x86_64-gcc pacman -S make cd alpaca.cpp make

chat.exe should appear there C:\msys64\home\USERNAME\alpaca.cpp\chat.exe

Doesn't work for me

When I "make"

D:/msys64/ucrt64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_
inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:911:33: note: called from here
  911 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:911:33: note: in definition of macro 'GGML_F32Cx8_LOAD'
  911 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~
ggml.c:1274:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1274 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
D:/msys64/ucrt64/lib/gcc/x86_64-w64-mingw32/12.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_
inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)

etc

If I don't include the patch, it compiles, but chat.exe gives me an illegal instruction

fancellu avatar Mar 30 '23 16:03 fancellu

Try this version https://github.com/SiemensSchuckert/alpaca.cpp

SiemensSchuckert avatar Mar 31 '23 00:03 SiemensSchuckert

Try this version https://github.com/SiemensSchuckert/alpaca.cpp

Thanks that works fine. yaaaay

fancellu avatar Mar 31 '23 16:03 fancellu

same probelm

sarfraznawaz2005 avatar Apr 02 '23 06:04 sarfraznawaz2005

@SiemensSchuckert that worked thanks a lot :)

sarfraznawaz2005 avatar Apr 02 '23 06:04 sarfraznawaz2005