llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

[User] Interactive mode immediately exits on Windows with Zig

Open karashiiro opened this issue 1 year ago • 14 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [x] I carefully followed the README.md.
  • [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When building with Zig, running the example command for Alpaca results in an interactive prompt that I can type in.

Current Behavior

When building with Zig, running the example command for Alpaca results in an interactive prompt that exits immediately, without producing an error message. Non-interactive mode works fine.

When using Command Prompt specifically, the process exits and leaves the console text green - that doesn't happen in PowerShell or Git Bash, which reset the console color. I think that means that it isn't reaching this line.

The commands, for reference:

zig build -Drelease-fast
.\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7

I also tried using LLaMA in interactive mode, which resulted in the same behavior.

.\zig-out\bin\main.exe -m D:\llama\LLaMA\7B\ggml-model-q4_0.bin -ins

Building with MSVC via CMake produces a binary that works perfectly fine (besides also leaving the Command Prompt console text green when exiting with Ctrl+C).

mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
.\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using:

Device name DESKTOP-HP640DM Processor 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz 3.50 GHz Installed RAM 24.0 GB (23.8 GB usable) System type 64-bit operating system, x64-based processor Pen and touch Pen support

  • Operating System:

Edition Windows 10 Home Version 22H2 Installed on ‎3/‎17/‎2021 OS build 19045.2728 Experience Windows Feature Experience Pack 120.2212.4190.0

  • SDK version:
> pyenv exec python3 --version
Python 3.10.9
> cmake --version
cmake version 3.20.0-rc5
> cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.19045.
-- The C compiler identification is MSVC 19.29.30148.0
-- The CXX compiler identification is MSVC 19.29.30148.0
<snip>
> zig version
0.10.1
> git log | head -1
commit 74f5899df4a6083fc467b620baa1cf821e37799d
  • Model checksums:
> md5sum .\models\ggml-alpaca-7b-q4.bin
\7a81638857b7e03f7e3482f3e68d78bc *.\\models\\ggml-alpaca-7b-q4.bin
> md5sum D:\llama\LLaMA\7B\ggml-model-q4_0.bin
\b96f7e3c1cd6dcc6ffd9aaf975b776e5 *D:\\llama\\LLaMA\\7B\\ggml-model-q4_0.bin

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. Clone the repo (https://github.com/ggerganov/llama.cpp/commit/74f5899df4a6083fc467b620baa1cf821e37799d)
  2. Run zig build -Drelease-fast
  3. Run the example command (adjusted slightly for the env): .\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
  4. Observe that the process exits immediately after reading the prompt

Failure Logs

Running the Zig build:

> .\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616581
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.
>

> # (process exited automatically)

Running the MSVC/CMake build:

> .\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616683
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.
> Tell me something I don't know.
The longest river in the world is the Nile River, which stretches 6,650 km (4,130 miles) across the continent of Africa.
>

> # (Ctrl+C)

karashiiro avatar Apr 16 '23 03:04 karashiiro

Maybe @iacore can help (as the author of the build.zig file)?

prusnak avatar Apr 16 '23 08:04 prusnak

Disclaimer: I don't know much about Windows.

The information is not enough. How the program exited is important.

This is likely due to UBSan being on by default in Zig build, and the program hitting undefined behavior. On linux, this is always SIGILL.

Otherwise, it's a Zig bug.

iacore avatar Apr 16 '23 13:04 iacore

It builds in -Drelease-safe as well, if that's relevant. PowerShell doesn't give exit codes which is annoying for this, but running this under Git Bash seems to suggest the exit code is 0:

$ ./zig-out/bin/main.exe -m ./models/ggml-alpaca-7b-q4.bin -ins && echo OK || echo oh no
main: seed = 1681653840
llama.cpp: loading model from ./models/ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 2


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.


> OK

karashiiro avatar Apr 16 '23 14:04 karashiiro

That's weird. I would recommend not using Zig to build this on Windows then.

iacore avatar Apr 16 '23 18:04 iacore

I am on Mac M1 with rosetta based kernel. I also getting similar output, however I can inference the model fine.

llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = ggmf v1 (old version with no mmap support)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113744.11 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 2052.00 MB per state)

Divjyot avatar Apr 17 '23 06:04 Divjyot

Sounds like a different issue, that's all just diagnostic information apart from the warning, which is unrelated to this.

karashiiro avatar Apr 17 '23 13:04 karashiiro

@karashiiro

I took a stab at debugging this, and I think I know the reason why it exits. The immediate problem is that getc() prints Invalid parameter passed to C runtime function. (you can only see this in a debugger, since it is logged via OutputDebugStringA) and returns an error when called indirectly from std::getline(). The culprit is apparently this line, which forces the standard input to use UTF-16. If you comment it out, the main starts working again, even though you lose the ability to enter non-ASCII text.

More precisely, I think this is what happens:

  • after _setmode(..., _O_WTEXT) you can no longer use getc() on the file stream and have to use getwc() instead
  • zig bundles a copy of libc++ on Windows by default
  • the std::getline() implementation from libc++ eventually calls __stdinbuf<_CharT>::__getchar() to read a symbol from the standard input
  • __stdinbuf<_CharT>::__getchar calls getc() unconditionally and fails
  • this leads to std::getline() also failing
  • when this happens, we exit with code 0

A small self-contained example to illustrate the problem would be this:

#include <iostream>
#include <fcntl.h>
#include <io.h>

int main() {
	_setmode(_fileno(stdin), _O_WTEXT);
	std::wstring buf;
	if (!std::getline(std::wcin, buf)) {
		std::cout << "ERROR" << std::endl;
	}
}

If you compile this program with zig c++ and run it, it will print ERROR and exit immediately. If you comment out the line with _setmode(...), it will read a line of text from stdin, as expected.

I have absolutely no idea how to fix it, though. _setmode(..., _O_WTEXT) is there for a very good reason and shouldn't be removed. As far as I can see, there is no way to make std::getline() from libc++ to use getwc() instead of getc(). And I don't see a way to get rid of bundled libc++ when building with Zig, either.

dfyz avatar Apr 18 '23 00:04 dfyz

And I don't see a way to get rid of bundled libc++ when building with Zig, either.

zig build does allow passing custom libc overrides via --libc, per its help, but using the default overrides from zig libc doesn't compile at all. That gives a bunch of errors in the vein of C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include/vcruntime.h:193:29: error: expected ';' after top level declarator.

Trying each of these gives slightly different parsing errors while attempting to build libc++ (after zig libc > libc.txt):

  • zig build -Dtarget=native-native-gnu --libc libc.txt
  • zig build -Dtarget=native-native-msvc --libc libc.txt

And a plain zig build -Dtarget=native-native-msvc just gives this:

> zig build -Dtarget=native-native-msvc
error: lld-link: could not open 'liblibc++.a': No such file or directory
lld-link: error: could not open 'libuuid.a': No such file or directory

I'm not sure if there's some way to set up a Zig-specific directive to block out an alternative implementation that doesn't use std::getline(), but that might be one way to manage this? I'm not sure how trivial that would be, since I'm neither a C++ expert nor a Zig one (I only tried zig build for the novelty of it 🐢).

karashiiro avatar Apr 18 '23 05:04 karashiiro

Using _setmode on console I/O streams after they have already been initialized can lead to problems and generally shouldn't be done. It should be enough to use UTF-8 with char* internal implementation and set the console codepage to UTF-8 as in #420 .

The issue described in #840 was that some UTF-8 characters were interpreted as zeroes, which shouldn't happen since the UTF-8 codepage does not have zero bytes in it and is compatible with char/ASCII streams, only the representation being different. UTF-16 does have zero bytes though, and this where the problem most likely stems.

I'd suggest reverting the #840 commit as the UTF16 -> UTF8 translation shouldn't be needed, and then looking again. From this comment the actual culprit seems to be that the implementation of std::getline is broken and doesn't properly support UTF-8, unfortunately I cannot read Chinese though. In any case the better starting point for investigating what's the actual root cause (why zero bytes are being input to a UTF-8 stream in the first place) would be to use a starting point without #840

The best long term solution would probably be to use fgetc() / fread() and regular file streams instead of the std family of getline and ifstream to remove the problems from library-specific implementations.

anzz1 avatar Apr 18 '23 21:04 anzz1

@karashiiro

zig build does allow passing custom libc overrides via --libc

I think --libc changes be the C library (e.g., ucrt, which contains getc() and getwc()), not the C++ one (e.g., libc++, which contains std::getline())? We are having problems with the latter, not the former.

And a plain zig build -Dtarget=native-native-msvc just gives this

Judging from this issue, I'd say that building C++ binaries for the MSVC ABI is generally not supported yet (although the build errors there are very different from your linking error).

I'm not sure if there's some way to set up a Zig-specific directive to block out an alternative implementation that doesn't use std::getline(), but that might be one way to manage this?

This should theoretically be possible, but I don't believe replacing std::getline() just for the sake of making the Zig build work is worth it. Might be a good idea for unrelated reasons (see below).

@anzz1

This setmode(...) usage does strike me as weird, but apparently it's what people do in the wild. More importantly, there's an LLVM review request (updated today no less), which I think should fix the std::getline() implementation in libc++ after setmode(..., _O_WTEXT ) (I haven't tested it, but it does replace getc() with getwc() when appropriate).

Although I absolutely agree that introducing additional UTF-16->UTF-8 translation in #840 makes no sense, even though it fixes the immediate problem with the input. It does look like std::getline() might be broken, since I can see in the debugger that reverting #840 results in std::getline() returning 3 zero bytes for a string with 3 non-ASCII-symbols. I will try to find out why this happens (thankfully, the implementation of std::getline() is open source).

dfyz avatar Apr 19 '23 01:04 dfyz

On the latest MSVC build llama-master-6667401-bin-win-avx2-x64

image

as part of the CRT startup, after setting the application mode to console with __set_app_type to _CONSOLE_APP (1), then _set_fmode is called with _O_TEXT (0x4000) which sets the I/O mode to narrow text. Later _initterm is called which allocates the console and does a bunch of other stuff. At this point the right mode should be already set, not later.

I can remember from past troubleshootings that changing the I/O mode on a console stream after it's already allocated and used shouldn't be done and can lead to various bugs and weird interactions. I do not have any certain proof at hand to validate this though. The general wisdom that has worked well for me is to choose either char (ascii/utf-8/mbcs) or wchar (unicode/utf-16) for the whole program at the very beginning and to stick with it, and to never mix them up unless it's absolutely necessary. Also making sure that whatever the CRT library you had to import which generates invisible code on compilation understands the choice you made.

Dealing with character encodings can be painful in general. However UTF-8 is an okay solution, while UTF-16 should be avoided at all costs. Explicitly #undef 'ing UNICODE , _UNICODE and _MBCS before including any windows headers should make sure that UTF-16 mode is not being secretly set by some CRT or STL function behind your back and should set up the right mode on CRT initialization.

_MBCS technically supports UTF-8 nowadays as a variable-width encoding, however I would advise against using it as you don't want to deal with strlen() on a char implementation suddenly meaning number of characters and not the amount of bytes. In general, I think it's best to always use char and think of everything as bytes up until to the point it's time to display something to the user. It's the least amount of headache in a world of headaches of character encoding.

Another problem with the current solution is that using _setmode(_fileno(stdin), _O_WTEXT) now decouples the character encodings of input and output while residing in the same console instance. Yes the API does allow you to set different codepages for a console input and output, but that doesn't necessarily mean it's supported behaviour to do that.

TL;DR; Don't mix and match character encodings unless absolutely necessary.

anzz1 avatar Apr 19 '23 02:04 anzz1

what's the actual root cause (why zero bytes are being input to a UTF-8 stream in the first place)

Whoa, I think I found the root cause. It's this issue. The problem is in ReadFile() when applied to console handles, so it's not something that can be fixed either in the C or the C++ library (they still have to call ReadFile() at some point).

As I understand the explanation: the console host represents the text in UTF-16 internally and has to convert it to an OEM encoding (UTF-8 in our case) before returning it to the ReadFile() caller. However, it assumes that each UTF-16 character corresponds to one OEM character, so of course this breaks for pretty much anything outside ASCII.

I think this has been fixed only in February this year. So unfortunately we can't use UTF-8 input here.

dfyz avatar Apr 19 '23 11:04 dfyz

I think this has been fixed only in February this year

Yep, I just built and ran Windows Terminal from the commit fixing the issue and the previous one. As expected, the former allows me to properly enter non-ASCII symbols with #840 reverted, the latter doesn't.

Of course, we can't rely on everyone having a recent version of Windows Terminal, so I guess that _setmode(_fileno(stdin), _O_WTEXT) has to stay, even though it might interfere with the mode set in _initterm(). Or is there a better solution that might work?

dfyz avatar Apr 20 '23 03:04 dfyz

Perhaps a possible solution. Zig to msvc (x86, x86_64, aarch64) targets. https://github.com/kassane/xwin-zig-test

kassane avatar Jun 23 '23 23:06 kassane

Tangentially related: even if this was fixed, building with Zig on Windows is currently even more broken due to https://github.com/ziglang/zig/issues/15958 unless you pass -fno-lto. I recommend using skeeto/w64devkit; even if you edit the Makefile to include -flto it works fine.

ghost avatar Jun 29 '23 22:06 ghost

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 09 '24 01:04 github-actions[bot]