llama.cpp
llama.cpp copied to clipboard
Add --multiline-input (previously --author-mode)
I wanted to make it easier for authors to use this in their writing process.
I've tested the code on both Linux and Windows. I don't see why it wouldn't also work on a Mac, but it would be nice to have confirmation of that.
I also apologize for the volume of lines changed but each change seemed tied to the next so I ended up putting it through as one commit. With any code changes, my goal was to make main.cpp simpler to read. There's still a lot that can be done, but I do believe the input loop is a lot cleaner now.
Changes:
- Adds
--author-modewhere we read from input until the user ends their line with a\ - Allows the user to end with
/such that inference starting from that position (without the\n) - Highlights the above-mentioned control characters (when
--coloris enabled) so it's harder to accidentally end a line with\or/ - Cleans up superfluous control characters and line breaks
Before:

After:

Detailed changes
Author mode allows one to write (or paste) multiple lines of text without ending each line in a \. LLM models do better the more context you give them, but the original input flow made this difficult. (Before pasting your writing you'd have to edit a \ into each line or paste it to a file, save that file, and then pass that path in with -f. If you're doing multiple revisions, that can become overwhelming.
Another useful feature for writing is the ability to start inference in the middle of a line. It’s not uncommon for writers to get stuck in the middle of a paragraph or piece of dialogue. I wanted to provide a way for the language model to generate text from that point, but previously, the best one could do was to hit enter and have the code append a \n to the buffer before starting inference.
I considered making author mode the default and creating a --chat-mode option with the previous behavior (and implying chat mode with --instruct). However, I decided against changing the default interaction people were familiar with. As a compromise, I allow the / operator to work in both modes.
The other quality of life improvement I really wanted for writing with the AI was to get rid of all the superfluous newlines and control characters that jumble up your text while working. To do this and have it display properly, I had to change the code to read one char at a time. By reading one character at a time, we don't have to print the newline when the user hits enter. This allows the use of / and \ to flow more smoothly and lets us clean up any control characters from the input before we start text inference. This also makes the Windows and Linux versions behave the same in regards to input buffering, for better and for worse. (For example, if you hit a key while it's inferring, it won't jumble up the text, but you also won't see your text while the model is loading.)
Another improvement I made for writing with the AI was to eliminate superfluous newlines and control characters that can clutter up text while working. To achieve this and display text properly, I changed the code to read one character at a time. To start with, when the user hits enter, we don’t have to print a newline. It allows for smoother use of / and and lets us remove any control characters from the input before starting text inference. This change also makes the Windows and Linux versions behave consistently in terms of input buffering, with both advantages and disadvantages. For instance, hitting a key while the model is inferring won’t mix it into the text on Linux anymore, but you also won’t see any of the input you type while the model is loading.
The other code changes are to console_state. It now holds the previous termios settings so we can safely reset the terminal on exit for POSIX systems. This also means including termios.h. The other change was to add the author_mode boolean. I thought about calling it input_mode and making an enum but unless we want to have different input modes for chat, instruct, etc, this is just simpler (and it's easily changed in the future).
I also considered finding a way of hiding that first space that's generated when priming the llama models since the original llama tokenizer also starts each prompt with a single space. Are we sure this is necessary? If so, would it be acceptable to just hide that first space?
Future possible changes
Further quality of life enhancements:
- Full support for left, right, ins, del, home, and end keys
- I don't foresee any issues implementing this other than there just being a larger code base, which may be unwanted
- At some point using a library like ncurses is better, but to keep the requirements to a minimum, adding this ourselves doesn't feel unreasonable
- Use color by default for interactive terminals (that aren't being piped)
- Implemented with
isattyon POSIX systems and_isattyon Windows - Programs like
grepalready do this - I would suggest adding
--color=always,never,autojust likegrephas but perhaps switch fromcolor=nevertocolor=autoon any interactive mode (for compatibility--colorwithout an option would just indicatealways)
- Implemented with
- Change the cursor to indicate when the user is waiting on inference
- We could probably only do this when
use_coloris true since changing the cursor will leave artifacts in stdout if piped
- We could probably only do this when
- Choose a different color for control characters
\and/- I think the current colors are clear enough but if someone feels otherwise, it would be easy to use a different color
- Read input on another thread
- This would give us the ability to process each line as it is added as opposed to waiting until all input has been entered. This will also allow us to take input a little earlier as the model is being primed with " "
- Would make the program "feel" a little faster
Further code improvements:
- Replacing
last_n_tokenswith a ring-buffer - Abstract away signal handling code to common to remove the last of the #ifdef blocks in
main.cppcode - Evaluate conversions from 'time_t' to 'int32_t' and 'time_t' to 'int32_t' and static cast them
Of the future possible changes, I think using the other control keys is most appealing to me. Followed by putting the input on a separate thread to allow context chugging on previous lines as the user writes subsequent lines. I'm not very active on github so I have no way of proving it, but I'm quite experience with threading and synchronization. Both of these (particularly the multithreading) may add more complexity to the code than is desired for an "example" program.
Edit: This is also a first step towards moving away from Ctrl-C to interrupt inference if that's desired. Once we are handling keys one a time like this, we can more easily make Esc (or any other key) interrupt inference.
Compiles and appears to work fine on Intel macOS
Looking through the pull requests I've noticed others wanting to add their own options. If we're going to keep adding options, it might be nice to separate common options from advanced options. When someone runs main without any parameters or we get an invalid_param, we could display the common options and when someone uses -h or --help we could show both the regular and advanced options. Programs like grep and wget do this.
The "advanced" options would be things your average user isn't likely to use. Things such as --ignore-eos, --perplexity, and --mtest. I'd even go as far as saying --author-mode belongs there just to keep things shorter and less overwhelming.
I was even hesitant myself in adding another command switch when making this pull request. If this goes through, I'll submit another pull request to divide these options up. Although perhaps there's already enough options for such a separation.
I think it's best to rename this to something that more clearly describes what it does. VIM has a paste mode. My two alternative names are --paste-mode and --multiline-input. I'm going to go with the later since that's the most descriptive. I'm still open to other ideas.
And this commit still contains quality of life improvements authors may like. I'm worried the size of this pull request is keeping reviewers away. I may also try to separate the console changes into its own pull request but it's less justified without the different line ending support added here.
This prevents any non‐ASCII input for me on GNU/Linux. I tried with various terminals, shells, locales, and input methods. I verified that the input is not simply invisible (not echoed), but that it is not being put into the input buffer.
Here are some strings that become impossible to enter:
françaisnon‐ASCII👱🏻♀️
I did have this working, at least partially. This was the bug:
if (input_char < 32)
It should have been:
if ((unsigned)input_char < 32)
It's working with unicode for me if you want to try again @Senemu.
It is working for me now too. 👏
I can enter all characters now, but when deleting them, the terminal output does not match if the characters are fullwidth or combining.
For example, if I type Test and press Backspace, t is deleted, but the cursor goes back only half its width. Then, if I press Backspace again s is not deleted from the terminal, but it is deleted from the input buffer, so there is a mismatch between what I see and what LLaMA sees.
Something similar happens when using combining characters: If I type á and press Backspace once, the character is visually deleted, but, in reality, a remains. This even allows me to seemingly alter text I didnʼt type, since I can then press Backspace again, putting the cursor behind its initial position.
@DannyDaemonic
This change is great - merge it when it is ready
Ok, pretty sure I got it this time.
I tested it with "Čüßtøm åçćéñtš ğḥḯṯ ḵḷṃñṓṕ ŕśẗŭṿẅẋýź Жизнь Волга ありがとう 你好".
This string includes Latin-based characters with diacritics, Cyrillic characters ("Жизнь Волга" means "Life Volga"), and phrases in Japanese ("ありがとう" means "Thank you") and Chinese ("你好" means "Hello"). These characters should display correctly in most modern terminals and fonts. No problems in Linux but the default Windows command prompt has an issue with a couple characters (the newer Windows Terminal shows them all). Either way, the code seems to handle everything correctly.
@Senemu Did you want to try breaking it again? 😄 Pretty sure I got it this time.
Did you want to try breaking it again? :smile: Pretty sure I got it this time.
I think that you did! 😃
The only thing I have found is that if I type a combining character by itself at the start of an input, it applies to the preceding character instead of being treated as an isolated combining character. As such, it is impossible to (visually) delete it. Ideally, a combining character without a corresponding base character (which is what conceptually happens in this case, even if there happens to be a character before it in the terminal output) should display by itself (as if combined with a blank space). But note that wanting to do this and expecting it to work is weird, and that the unpatched main does not handle it any better.
Combining characters work as expected in any other circumstance I tested. I couldnʼt make what I see and what LLaMA sees mismatch. 👏
Oh, good catch. That's something else I didn't consider. The original getline doesn't handle this any better, but it would be easy to prevent the user from starting with a combining character since they are clearly defined.
I guess the bigger question is, is there any utility in starting your input with a combining character? If the text ended on "cafe", you could turn it into "café", and it does seem LLaMA models can tokenize the combining character separately, but when tokenizing, it doesn't break it out automatically, it's got to be written like that. I wonder if the text used for training was normalized in some way or if it's been trained on both representations of the letters.
text: 'cafe'
number of tokens = 2
1113 -> 'ca'
1725 -> 'fe'
text: 'café' (without combining character)
number of tokens = 3
1113 -> 'ca'
29888 -> 'f'
29948 -> 'é'
text: 'café' (with combining character)
number of tokens = 3
1113 -> 'ca'
1725 -> 'fe'
30103 -> '́'
I guess since it "breaks" the output on the screen, I feel like the correct thing to do is to prevent user input from starting with a combining character. But at the same time I'm hesitant to take that away if someone depends on that use case. Any thoughts?
One realistic use case for combining characters at the beginning of input that I can come up with is an exchange such as this:
USER: From here on I will type a single character per message. You will tell me its Unicode code point and name. Ready? ChatLLaMa: Ready! USER: ñ ChatLLaMa: The Unicode code point for 'ñ' is U+00F1 and its name is "LATIN SMALL LETTER N WITH TILDE". USER: ‽ ChatLLaMa: The Unicode code point for '‽' is U+203D and its name is "INTERROBANG". USER: ̈ ChatLLaMa: The Unicode code point for '̈' is U+00E4 and its name is "LATIN SMALL LETTER A WITH DIAERESIS".
Note that the answer is wrong.
If I do something more reasonable instead…
USER: ◌̈ ChatLLaMa: The Unicode code point for '◌̈' is U+0308 and its name is "COMBINING DIAERESIS".
…the answer is correct[^1].
It might be better not to take the possibility to start input with a combining character away, since it can be used to research things like this. I think that the patch is fine as it is right now. People breaking things this way probably know what they are doing and what to expect. But if it is a simple change to forbid the insertion of combining characters at the beginning of input to prevent corrupting the terminal output, I think that would be fine too.
Regarding the models, they seem to treat combining and precomposed characters similarly in text, but I bet that pretty much all of their training data was in Normalization Form C (NFC), even if no attempt was made at normalizing it.
ChatLLaMa sees different normalization forms as spelling variations:
USER: What is a café? ChatLLaMa: Café (also spelled café or cafe) is an establishment which primarily serves coffee, related drinks, and light refreshments.
It seems like the different encodings of café (with a combining acute accent) and café (with a precomposed letter) confuse it. In this scenario, it is probably better to always use NFC for input.
[^1]: It is ignoring the base character, but I would say that considering it just a carrier for the combining character is the right interpretation in this case.
Windows console has more quirks to work around than there are numbers in the Fibonacci sequence. What a nightmare.
@Senemu This covers all the characters and control codes now.
I love how smooth the interaction is. It feels like a professional piece of software.
I love it too! Itʼs working as I would expect with everything I threw at it now.
I didnʼt even consider how hard it would be to handle variable‐width input like a tab character. I was just trying to paste in some code. 😅
But you made it work! It feels like a professional piece of software because you did a professional job. 😉
After this update, the program began to hang after entering the input. I'm using w64devkit to run.
Can you explain hang? Like it just waits? Is your CPU going? Can you put more input in?
I enter an input, press enter and for some reason the cursor does not go to a new line. After that, my 6 core 6 thread processor is loaded by 20% by this process, although 3 cores are specified in the arguments. I can’t add anything more to the input, all my additional inputs are simply ignored. I tried to wait for 5 minutes, nothing changes, the load on the processor remains the same, the RAM is filled in the same way. Changing the model does not change anything. If you press ctrl + c, then it will display the timings as before. I compiled the program with make LLAMA_OPENBLAS=1, if you just build it through make without arguments, then for some reason it complains that it cannot find cblas_sgemm, although it used to compile normally without openblas, but I can’t say for sure when it stopped compiling without openblas, maybe even more before your commit.
@newTomas: Try make clean before compiling again without OpenBLAS.
You could also pass the argument --batch_size 8 (or any other value below 32) to prevent the compiled program from using BLAS.
@Senemu make clean helped build the version without blas, also added --batch_size 8 just in case. The problem did not go away, besides, I was able to run it in a regular cmd and powershell and the same problem was there.
I added a few lines to the code to find where the problem might be. Looks like getwchar hangs when hitting enter.
It's strange getwchar would get stuck. Can you make it also print directly before the call as well? If it's really stopping in getwchar you should always just be able to press another character to get things moving.
I see "Ярлык" in your title bar, so your system is configured for Russian? I wonder if it's related to that or if it could be due to something misconfigured in w64devkit. I wonder if it would still get stuck with a different keyboard layout.
Yes, I have a Russian version of Windows. Here are screenshots with output to the console before calling getwchar. From the layout and the console, nothing changes.
Well, it looks like you've narrowed it down to the wgetchar call. There's no reason it should be hanging on Enter.
w64devkit uses Mingw, so my suspicion is there's a bug in their version of getwchar. All these wchar functions are kludgy. Again, the other difference is that you're running the Russian version of Windows. Have you tried an official build to see if it happens there?
Either way, I may be able to replace getwchar in Windows with _getwch which is similar but designed to work with non-blocking functions like _kbhit so it may work better.
Indeed, the official build works correctly. If you make a version for w64devkit with mingw, I'll test it.
If it's only happening in w64devkit then this has to be a mingw bug. More precisely, w64devkit uses the mingw-w64 toolchain, which is a fork of the original mingw project. Are you able to try other versions of w64devkit/mingw-w64? It might be fixed in a later version or not be there in an earlier version.
I'll try to swap out getwchar for a similar function to see if we can work around the bug, but we're pretty limited when it comes to Unicode support in Windows. You are forced to use the wide character functions.
Are you a programmer or developer yourself? We could file a bug report with w64devkit/mingw-w64, but you typically want the smallest example code that features the bug. I don't have time to test it right now but I believe I've boiled down the bug to just this code:
#include <windows.h>
#include <winnls.h>
#include <fcntl.h>
#include <wchar.h>
#include <stdio.h>
#include <io.h>
int main() {
// Initialize for reading wchars and writing out UTF-8
DWORD dwMode = 0;
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
if (hConsole == INVALID_HANDLE_VALUE || !GetConsoleMode(hConsole, &dwMode)) {
hConsole = GetStdHandle(STD_ERROR_HANDLE);
if (hConsole != INVALID_HANDLE_VALUE && (!GetConsoleMode(hConsole, &dwMode))) {
hConsole = NULL;
}
}
if (hConsole) {
SetConsoleMode(hConsole, dwMode | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
SetConsoleOutputCP(CP_UTF8);
}
HANDLE hConIn = GetStdHandle(STD_INPUT_HANDLE);
if (hConIn != INVALID_HANDLE_VALUE && GetConsoleMode(hConIn, &dwMode)) {
_setmode(_fileno(stdin), _O_WTEXT);
dwMode &= ~(ENABLE_LINE_INPUT | ENABLE_ECHO_INPUT);
SetConsoleMode(hConIn, dwMode);
}
// Echo input
while (1) {
// Read
wchar_t wcs[2] = { getwchar(), L'\0' };
if (wcs[0] == WEOF) break;
if (wcs[0] >= 0xD800 && wcs[0] <= 0xDBFF) { // If we have a high surrogate...
wcs[1] = getwchar(); // Read the low surrogate
if (wcs[1] == WEOF) break;
}
// Write
char utf8[5] = {0};
int result = WideCharToMultiByte(CP_UTF8, 0, wcs, (wcs[1] == L'\0') ? 1 : 2, utf8, 4, NULL, NULL);
if (result > 0) {
printf("%s", utf8);
}
}
return 0;
}
There might be other settings we change that trigger the bug in mingw-w64, but I'm guessing it's in this snippet. (We might even be able to make it smaller by cutting out some of that initialization.) If you're able to test that and see if the bug still happens, we could consider opening a bug report with mingw-w64.
I have quite a bit of experience with C/C++, my main language is typescript. I created a folder with the main.cpp file, put your code into it and compiled it with g++ main.cpp -o main
The bug is reproducing!
g++ --version returns
g++ (GCC) 12.2.0
I also have the latest version of cygwin64, the problem is also reproduced here.
g++ --version in cygwin64
g++.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 12.2.0
I also found a separate mingw64 and the same problem there.
g++.exe --version in mingw64
g++.exe (x86_64-win32-seh-rev0, Built by MinGW-W64 project) 12.2.0
I also found mingw64 in msys64 and the same problem there.
g++.exe --version in msys64
g++.exe (Rev4, Built by MSYS2 project) 12.2.0
Then I downloaded the latest version of w64devkit, but this bug is still there.
g++.exe --version in lastest w64devkit
g++ (GCC) 13.1.0
Is this a bug in the compiler itself and has been for a long time?
The bug is reproducing!
That's great for two reasons. First, there's nothing crazy we're doing. This is a perfectly acceptable way to read Unicode in Windows. And second, that means we can open a bug report with that small snippet. You spent the time isolating the function at large, so if you'd like you can take credit and open a bug report. (I edited the code in my previous reply to use CP_UTF8 and _O_WTEXT directly. I wasn't sure where they were defined but it's probably included in windows.h and will look less hacky if we just use those values directly.)
Is this a bug in the compiler itself and has been for a long time?
It even happens in cygwin? Is that compiler even based on mingw? Either way it could be the compiler itself or their implementation (or wrapper around) Windows' getwchar. If they are expecting/waiting for a particular key it could even be a \r\n\ vs \n issue.
Edit: Oh, in your cygwin output it says "Built by MinGW-W64". For some reason I thought they used regular GCC internally or something.
If you remove the line _setmode(_fileno(stdin), _O_WTEXT); then the Cyrillic alphabet stops working, but after pressing enter it does not freeze, but moves the cursor to the beginning of the line and the subsequent input overwrites the previous one.
I tried this code:
setlocale(LC_ALL, "UTF-8");
wint_t wc;
while (WEOF != (wc = getwc(stdin)))
printf("wc = %lc\n", wc);
There are no freezes when pressing enter, but Cyrillic is not displayed when outputting.
Google gives such a simple code for reading unicode, but whichever I took, none of them worked. Why c++ programming is such a pain? I'll probably rewrite main.cpp to nodejs.
_setmode(_fileno(stdin), _O_WTEXT); is how we tell Windows we want to read Unicode characters. So if you take it out, you won't be able to read Cyrillic.
I like to test with these two strings:
Hello, こんにちは, Привет, Bonjour, 你好, Hola, السلام عليكم, Guten Tag, 안녕하세요!
And
å̄b̈çd̦ȩḟğḥḯṯḵḷṃñṓṕŕśẗŭṿẅẋýź
You may have to use Windows Terminal for some of these characters to show up properly. Limitations in the Windows Command Prompt prevent some of them from displaying there.
It's strange, I installed windows terminal and the code that I provided suddenly worked with Cyrillic in both cmd and windows terminal, but other languages (こんにちは, 你好, السلام عليكم, 안녕하세요) still don't work either in cmd or in windows terminal. I also tried the following code:
setlocale(LC_ALL, "UTF-8");
wint_t wc[2] = {0, 0};
while (WEOF != (wc[0] = getwc(stdin))){
if (wc[0] >= 0xD800 && wc[0] <= 0xDBFF){
wc[1] = getwc(stdin);
}
printf("wc = %ls\n", wc);
wc[1] = 0;
}
Everything is the same in it.
cmd:
windows terminal with powershell:
|
windows terminal with cmd:
What do we do? Any ideas how to make it work with all characters and new line?