mlx-swift-chat StableLM

gN5JmWWk jpg-small

Mar 05 '24 00:03 vmanot

CC @davidkoski this is the issue I was referring to! Any help here much appreciated 👍

Mar 05 '24 23:03 awni

I see a slightly different issue, but likely related:

libc++abi: terminating due to uncaught exception of type std::invalid_argument: [addmm] Last dimension of first input with shape (1,0,-1) must match second to last dimension of second input with shape (2048,2048).

This is coming from Attention -> Linear. The problem stems from the shape of scores being [1, 0, 2048]. This is used to compute:

        let valuesHat = (scores.matmul(values)).transposed(0, 2, 1, 3).reshaped(B, L, -1)

which produces a shape of [1, 0, -1] (which is an issue, https://github.com/ml-explore/mlx/issues/789, but not the cause).

At the top level:

        (logits, cache) = model(expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)

y is an empty array:

(lldb) po y
array([], dtype=int32)

and it looks like that is what we are getting out of the tokenizer (I am using the default Why did the chicken cross the road? prompt).

I will continue looking at this later.

Mar 06 '24 00:03 davidkoski

Here is the pretokenizer config:

(lldb) po config.pretokenizers?.arrayValue
▿ Optional<Array<Config>>
  ▿ some : 2 elements
    ▿ 0 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "type"
          - value : Split
        ▿ 1 : 2 elements
          - key : "behavior"
          - value : Removed
        ▿ 2 : 2 elements
          - key : "pattern"
          ▿ value : 1 element
            ▿ 0 : 2 elements
              - key : Regex
              - value : (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^
\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[
]*|\s*[
]+|\s+(?!\S)|\s+
        ▿ 3 : 2 elements
          - key : "invert"
          - value : 1
    ▿ 1 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "add_prefix_space"
          - value : 0
        ▿ 1 : 2 elements
          - key : "trim_offsets"
          - value : 1
        ▿ 2 : 2 elements
          - key : "use_regex"
          - value : 0
        ▿ 3 : 2 elements
          - key : "type"
          - value : ByteLevel

which is handled here:

class SplitPreTokenizer: PreTokenizer {
...
    func preTokenize(text: String) -> [String] {
        guard let pattern = pattern else { return [text] }
        return pattern.split(text, invert: invert)
    }

Invert is true:

(lldb) p invert
(Bool) true

giving us this:

(lldb) p pattern.split(text, invert: true)
([String]) 1 value {
  [0] = ""
}

However if invert was false:

(lldb) p pattern.split(text, invert: false)
([String]) 10 values {
  [0] = "Why"
  [1] = " did"
  [2] = " the"
  [3] = " chicken"
  [4] = " cross"
  [5] = " the"
  [6] = " road"
  [7] = "?"
  [8] = " "
  [9] = ""
}

That looks reasonable. I don't know if:

invert should be false (the config seems to set it to true)
the StringSplitPattern isn't handling invert correctly
or there is something unhandled in the regular expression

If I edit the tokenizer.json and replace the value:

  "pre_tokenizer": {
    "type": "Sequence",
    "pretokenizers": [
      {
        "type": "Split",
        "pattern": { 
          "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\r\n]*|\\s*[\r\n]+|\\s+(?!\\S)|\\s+"
        },
        "behavior": "Removed",
        "invert": false
      },

it does ... something :-)

Anyway, that is the cause of this assertion failure.

Mar 06 '24 03:03 davidkoski

Here are some ideas on how to debug the model behavior:

install the llms/mlx_lm code from https://github.com/ml-explore/mlx-examples
- this gives you a working python version that you can compare against
- you can invoke it with this:

python -m mlx_lm.generate --model mlx-community/stablelm-2-zephyr-1_6b-4bit --prompt 'why did the chicken cross the road?'

==========
Prompt: <|user|>
why did the chicken cross the road?<|endoftext|>
<|assistant|>

The origin of the popular question, "Why did the chicken cross the road?" is a cultural phenomenon that dates back to ancient times. While it is often used as a humorous play on words, the question likely stems from a desire to understand the behavior of chickens as a group. The answer "because it's Friday" or "it's the chicken's way" does not accurately answer the question, as it doesn't provide a clear reason for the chickens' action. The question is often used for
==========

Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:

https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Models.swift#L65

Simple, but probably helpful.

Given the working python version you can do a few things:

the tokenizer produces an array of integers
- print out the tokens the python code generates, see utils.py: prompt_tokens = mx.array(tokenizer.encode(prompt))
- hard code the swift code to take this same array
- if this array works then you can suspect something in the tokenizer
the tokenizer can decode the tokens it prepares
- make sure it can decode both the tokens the swift tokenizer makes
- and the tokens the python code makes
set the random seed
- --seed in the command line tool and MLXRandom.seed() in python
- maybe set the temperature to 0
- generate a small number of tokens
- are they the same? the code to produce tokens from the logits might be slightly different between the two but I found the first token is usually the same with the same seed
assuming the tokens are different compare the execution of the models
- I found something like print("\(name) \(array.shape) \(array.sum())") in both swift and python (similar code in python) can help spot differences without looking at the whole tensor
- I had typos in the Attention layer a couple times -- incorrectly place parenthesis, etc.
make sure your weights are loaded correctly
- I noticed that you turned off verification of the arrays: try model.update(parameters: parameters, verify: [.none])
- turn that on -- at the ver least your Attention layer has incorrect keys

Good luck and ask if you have questions!

Mar 06 '24 04:03 davidkoski

@davidkoski thank you so much for the detailed breakdown of what went on here, this made a ton of sense!

The other takeaway for me here is that we need to improve debuggability + implement sanity checks, and also probably expose verification as a visible parameter that can be checked on/off. I'm going to think on a bit and add some UI for it – let me know if anything comes to mind here!

Mar 07 '24 06:03 vmanot