llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

In interactive/chat mode, sometimes User: does not appear and I need to manually type in my nickname

Open x02Sylvie opened this issue 1 year ago • 10 comments

  • In interactive/chat mode, sometimes User: does not appear and I need to manually type in my nickname for example:

' AI: Hello User: Hello AI: How are you

' instead of User: appears nothing and i need to manually type in User:, if i press enter without typing anything then llama diverges from conversation and starts spouting random stuff.

it also sometimes happens with AI's reply, as if its reply was eaten and I can type in stuff instead or press enter

x02Sylvie avatar Mar 21 '23 17:03 x02Sylvie

I find out sometimes ai think they chat in a forum, so the user can be kick out the chat😂

FNsi avatar Mar 22 '23 06:03 FNsi

This happens to me frequently.

It also often (but not always) ignores the reverse prompt and continues generating the user's reply, even when User: has been generated.

pjlegato avatar Mar 24 '23 18:03 pjlegato

I've experienced similar with older and the latest version of llama.cpp. When in interactive mode, the conversation sometimes hangs, and only continues when you hit ENTER. See screenshot below. Circled areas indicate line break after hitting enter.

@x02Sylvie unlike with you, it seems to carry on the conversation just fine.

Screenshot 2023-03-28 at 1 25 50 AM

jessejohnson avatar Mar 28 '23 01:03 jessejohnson

Have you tried increasing -n to some large value? I think the n_remain logic in main.cpp might be faulty

ggerganov avatar Mar 28 '23 18:03 ggerganov

I've experienced similar with older and the latest version of llama.cpp. When in interactive mode, the conversation sometimes hangs, and only continues when you hit ENTER. See screenshot below. Circled areas indicate line break after hitting enter.

@x02Sylvie unlike with you, it seems to carry on the conversation just fine.

Screenshot 2023-03-28 at 1 25 50 AM

Facing same issue, dose anyone know why?

hengjiUSTC avatar Apr 06 '23 14:04 hengjiUSTC

@jessejohnson @hengjiUSTC Very likely this is what we call the "context swap" - it occurs when the context is full and it takes a few seconds for the generation to continue. You don't need to press Enter - just wait for the context to be recomputed

ggerganov avatar Apr 07 '23 16:04 ggerganov

@jessejohnson @hengjiUSTC Very likely this is what we call the "context swap" - it occurs when the context is full and it takes a few seconds for the generation to continue. You don't need to press Enter - just wait for the context to be recomputed

Agreed, I have noticed that sometimes waiting a while lets it resume. It doesn't always happen (or I need to be more patient 😅)

jessejohnson avatar Apr 13 '23 10:04 jessejohnson

This may be the case sometimes. At some times, it doesn't matter how long I leave it, it just halts until I press Enter -- then it instantly starts producing text again.

pjlegato avatar Apr 13 '23 16:04 pjlegato

I'm having the same problem. Today I left it stuck for ~30m in the middle of writing a Rust function, and it didn't output anything during that period. My CPU remained at ~2% during that time. I just pressed enter and it immediately resumed, and the CPU jumped back up.

I'm running commit e9a9cb0c54461ffbda75b7b2f99f3ea5562291c2 compiled with CMake in release mode, with model Vicuna 13B 1.1.

Running with build/bin/main -m ../models/ggml-vicuna-13b-1.1-q4_2.bin -i -r 'User:' -f prompts/chat-with-bob.txt

I think the issue is something simple, IO related, not a token generation problem. The process is stuck on a read call. image As soon as I type enter, the read completes and the process continues.

diegov avatar Apr 22 '23 14:04 diegov

I'm having the same problem. Today I left it stuck for ~30m in the middle of writing a Rust function, and it didn't output anything during that period. My CPU remained at ~2% during that time. I just pressed enter and it immediately resumed, and the CPU jumped back up.

Actually my issue was that I had forgot to set --n_predict -1, so it was just breaking for input after 128 tokens. Not an actual bug, and probably nothing to do with the original issue reported.

diegov avatar Apr 23 '23 11:04 diegov

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Apr 10 '24 01:04 github-actions[bot]