jarcen
jarcen
I mentioned it in another issue: llama was trained with tokenizer augmentations, means the tokenizer occasionally did sub-optimal word partitioning at training time: https://github.com/google/sentencepiece#subword-regularization-and-bpe-dropout It is claimed it improves generalization....
Throwing some ideas about actual reasons behind the bug. I think it's the classic integer division gotcha: https://github.com/ggerganov/llama.cpp/blob/8cf9f34eddc124d4ab28f4d2fe8e99d574510bde/main.cpp#L757-L758 if batch size 'N' > 1 then there will be loss of...
Looking further, it also slowly creeps up as prompt being read(batch size = 4) ``` Used mem: 59375120, predicted 57474576 Used mem: 59440656, predicted 57474576 Used mem: 59506192, predicted 57474576...
Right, batch processing at least must construct `ggml_diag_mask_inf` for masked attention so each token in batch can attend not only to past memory but also to it's neighbors in the...
I started messing with this project two hours ago and had exactly same issue. Completely mangled output. Turns out for me the problem was that I compiled it with Cygwin....
My observations: Token 4013 = 'This'. Token 910 = ' This'. Token 10994 = 'Hello'. Token 15043 = ' Hello'. Notice whitespace. They're different. Don't know why python library doesn't...
I don't expect it will. I read sentencepiece frontpage documentation and it says it uses regularization at training time. Basically, it randomly creates suboptimal tokenized strings to improve robustness. It...
@bitRAKE Yes, those are transformer's hidden state, preserving them is sufficient. Now, the question is how to edit them properly. I'm also interested in removing n first elements to deal...
That's incorrect and it shouldn't sacrifice anything. It also should be faster on CPU. All Pytorch transformers I had to run on CPU were significantly faster at reading prompts than...
They are not being computed at the same time. Computations in one layer are separated in three steps I listed above. Step 2 operates on Query-Key-Value matrices which were already...