StableLM
CC @davidkoski this is the issue I was referring to! Any help here much appreciated 👍
I see a slightly different issue, but likely related:
libc++abi: terminating due to uncaught exception of type std::invalid_argument: [addmm] Last dimension of first input with shape (1,0,-1) must match second to last dimension of second input with shape (2048,2048).
This is coming from Attention -> Linear. The problem stems from the shape of scores being [1, 0, 2048]. This is used to compute:
let valuesHat = (scores.matmul(values)).transposed(0, 2, 1, 3).reshaped(B, L, -1)
which produces a shape of [1, 0, -1] (which is an issue, https://github.com/ml-explore/mlx/issues/789, but not the cause).
At the top level:
(logits, cache) = model(expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)
y is an empty array:
(lldb) po y
array([], dtype=int32)
and it looks like that is what we are getting out of the tokenizer (I am using the default Why did the chicken cross the road? prompt).
I will continue looking at this later.
Here is the pretokenizer config:
(lldb) po config.pretokenizers?.arrayValue
▿ Optional<Array<Config>>
▿ some : 2 elements
▿ 0 : Config
▿ dictionary : 4 elements
▿ 0 : 2 elements
- key : "type"
- value : Split
▿ 1 : 2 elements
- key : "behavior"
- value : Removed
▿ 2 : 2 elements
- key : "pattern"
▿ value : 1 element
▿ 0 : 2 elements
- key : Regex
- value : (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^
\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[
]*|\s*[
]+|\s+(?!\S)|\s+
▿ 3 : 2 elements
- key : "invert"
- value : 1
▿ 1 : Config
▿ dictionary : 4 elements
▿ 0 : 2 elements
- key : "add_prefix_space"
- value : 0
▿ 1 : 2 elements
- key : "trim_offsets"
- value : 1
▿ 2 : 2 elements
- key : "use_regex"
- value : 0
▿ 3 : 2 elements
- key : "type"
- value : ByteLevel
which is handled here:
class SplitPreTokenizer: PreTokenizer {
...
func preTokenize(text: String) -> [String] {
guard let pattern = pattern else { return [text] }
return pattern.split(text, invert: invert)
}
Invert is true:
(lldb) p invert
(Bool) true
giving us this:
(lldb) p pattern.split(text, invert: true)
([String]) 1 value {
[0] = ""
}
However if invert was false:
(lldb) p pattern.split(text, invert: false)
([String]) 10 values {
[0] = "Why"
[1] = " did"
[2] = " the"
[3] = " chicken"
[4] = " cross"
[5] = " the"
[6] = " road"
[7] = "?"
[8] = " "
[9] = ""
}
That looks reasonable. I don't know if:
- invert should be false (the config seems to set it to true)
- the StringSplitPattern isn't handling invert correctly
- or there is something unhandled in the regular expression
If I edit the tokenizer.json and replace the value:
"pre_tokenizer": {
"type": "Sequence",
"pretokenizers": [
{
"type": "Split",
"pattern": {
"Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\r\n]*|\\s*[\r\n]+|\\s+(?!\\S)|\\s+"
},
"behavior": "Removed",
"invert": false
},
it does ... something :-)
Anyway, that is the cause of this assertion failure.
Here are some ideas on how to debug the model behavior:
- install the llms/mlx_lm code from https://github.com/ml-explore/mlx-examples
- this gives you a working python version that you can compare against
- you can invoke it with this:
python -m mlx_lm.generate --model mlx-community/stablelm-2-zephyr-1_6b-4bit --prompt 'why did the chicken cross the road?'
==========
Prompt: <|user|>
why did the chicken cross the road?<|endoftext|>
<|assistant|>
The origin of the popular question, "Why did the chicken cross the road?" is a cultural phenomenon that dates back to ancient times. While it is often used as a humorous play on words, the question likely stems from a desire to understand the behavior of chickens as a group. The answer "because it's Friday" or "it's the chicken's way" does not accurately answer the question, as it doesn't provide a clear reason for the chickens' action. The question is often used for
==========
Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:
- https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Models.swift#L65
Simple, but probably helpful.
Given the working python version you can do a few things:
-
the tokenizer produces an array of integers
- print out the tokens the python code generates, see
utils.py:prompt_tokens = mx.array(tokenizer.encode(prompt)) - hard code the swift code to take this same array
- if this array works then you can suspect something in the tokenizer
- print out the tokens the python code generates, see
-
the tokenizer can decode the tokens it prepares
- make sure it can decode both the tokens the swift tokenizer makes
- and the tokens the python code makes
-
set the random seed
-
--seedin the command line tool andMLXRandom.seed()in python - maybe set the temperature to 0
- generate a small number of tokens
- are they the same? the code to produce tokens from the logits might be slightly different between the two but I found the first token is usually the same with the same seed
-
-
assuming the tokens are different compare the execution of the models
- I found something like
print("\(name) \(array.shape) \(array.sum())")in both swift and python (similar code in python) can help spot differences without looking at the whole tensor - I had typos in the
Attentionlayer a couple times -- incorrectly place parenthesis, etc.
- I found something like
-
make sure your weights are loaded correctly
- I noticed that you turned off verification of the arrays:
try model.update(parameters: parameters, verify: [.none]) - turn that on -- at the ver least your Attention layer has incorrect keys
- I noticed that you turned off verification of the arrays:
Good luck and ask if you have questions!
@davidkoski thank you so much for the detailed breakdown of what went on here, this made a ton of sense!
The other takeaway for me here is that we need to improve debuggability + implement sanity checks, and also probably expose verification as a visible parameter that can be checked on/off. I'm going to think on a bit and add some UI for it – let me know if anything comes to mind here!