mlx-swift-chat icon indicating copy to clipboard operation
mlx-swift-chat copied to clipboard

StableLM

Open vmanot opened this issue 2 years ago • 5 comments

gN5JmWWk jpg-small

vmanot avatar Mar 05 '24 00:03 vmanot

CC @davidkoski this is the issue I was referring to! Any help here much appreciated 👍

awni avatar Mar 05 '24 23:03 awni

I see a slightly different issue, but likely related:

libc++abi: terminating due to uncaught exception of type std::invalid_argument: [addmm] Last dimension of first input with shape (1,0,-1) must match second to last dimension of second input with shape (2048,2048).

This is coming from Attention -> Linear. The problem stems from the shape of scores being [1, 0, 2048]. This is used to compute:

        let valuesHat = (scores.matmul(values)).transposed(0, 2, 1, 3).reshaped(B, L, -1)

which produces a shape of [1, 0, -1] (which is an issue, https://github.com/ml-explore/mlx/issues/789, but not the cause).

At the top level:

        (logits, cache) = model(expandedDimensions(y, axis: 0), cache: cache.isEmpty ? nil : cache)

y is an empty array:

(lldb) po y
array([], dtype=int32)

and it looks like that is what we are getting out of the tokenizer (I am using the default Why did the chicken cross the road? prompt).

I will continue looking at this later.

davidkoski avatar Mar 06 '24 00:03 davidkoski

Here is the pretokenizer config:

(lldb) po config.pretokenizers?.arrayValue
▿ Optional<Array<Config>>
  ▿ some : 2 elements
    ▿ 0 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "type"
          - value : Split
        ▿ 1 : 2 elements
          - key : "behavior"
          - value : Removed
        ▿ 2 : 2 elements
          - key : "pattern"
          ▿ value : 1 element
            ▿ 0 : 2 elements
              - key : Regex
              - value : (?i:'s|'t|'re|'ve|'m|'ll|'d)|[^
\p{L}\p{N}]?\p{L}+|\p{N}| ?[^\s\p{L}\p{N}]+[
]*|\s*[
]+|\s+(?!\S)|\s+
        ▿ 3 : 2 elements
          - key : "invert"
          - value : 1
    ▿ 1 : Config
      ▿ dictionary : 4 elements
        ▿ 0 : 2 elements
          - key : "add_prefix_space"
          - value : 0
        ▿ 1 : 2 elements
          - key : "trim_offsets"
          - value : 1
        ▿ 2 : 2 elements
          - key : "use_regex"
          - value : 0
        ▿ 3 : 2 elements
          - key : "type"
          - value : ByteLevel

which is handled here:

class SplitPreTokenizer: PreTokenizer {
...
    func preTokenize(text: String) -> [String] {
        guard let pattern = pattern else { return [text] }
        return pattern.split(text, invert: invert)
    }

Invert is true:

(lldb) p invert
(Bool) true

giving us this:

(lldb) p pattern.split(text, invert: true)
([String]) 1 value {
  [0] = ""
}

However if invert was false:

(lldb) p pattern.split(text, invert: false)
([String]) 10 values {
  [0] = "Why"
  [1] = " did"
  [2] = " the"
  [3] = " chicken"
  [4] = " cross"
  [5] = " the"
  [6] = " road"
  [7] = "?"
  [8] = " "
  [9] = ""
}

That looks reasonable. I don't know if:

  • invert should be false (the config seems to set it to true)
  • the StringSplitPattern isn't handling invert correctly
  • or there is something unhandled in the regular expression

If I edit the tokenizer.json and replace the value:

  "pre_tokenizer": {
    "type": "Sequence",
    "pretokenizers": [
      {
        "type": "Split",
        "pattern": { 
          "Regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\r\n]*|\\s*[\r\n]+|\\s+(?!\\S)|\\s+"
        },
        "behavior": "Removed",
        "invert": false
      },

it does ... something :-)

image

Anyway, that is the cause of this assertion failure.

davidkoski avatar Mar 06 '24 03:03 davidkoski

Here are some ideas on how to debug the model behavior:

  • install the llms/mlx_lm code from https://github.com/ml-explore/mlx-examples
    • this gives you a working python version that you can compare against
    • you can invoke it with this:
python -m mlx_lm.generate --model mlx-community/stablelm-2-zephyr-1_6b-4bit --prompt 'why did the chicken cross the road?'

==========
Prompt: <|user|>
why did the chicken cross the road?<|endoftext|>
<|assistant|>

The origin of the popular question, "Why did the chicken cross the road?" is a cultural phenomenon that dates back to ancient times. While it is often used as a humorous play on words, the question likely stems from a desire to understand the behavior of chickens as a group. The answer "because it's Friday" or "it's the chicken's way" does not accurately answer the question, as it doesn't provide a clear reason for the chickens' action. The question is often used for
==========

Notice the augmentation of the prompt -- this is done using python code in the tokenizer configuration. We can't run that so you may need some configuration to help with this. For example in the example repo:

  • https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Models.swift#L65

Simple, but probably helpful.

Given the working python version you can do a few things:

  • the tokenizer produces an array of integers

    • print out the tokens the python code generates, see utils.py: prompt_tokens = mx.array(tokenizer.encode(prompt))
    • hard code the swift code to take this same array
    • if this array works then you can suspect something in the tokenizer
  • the tokenizer can decode the tokens it prepares

    • make sure it can decode both the tokens the swift tokenizer makes
    • and the tokens the python code makes
  • set the random seed

    • --seed in the command line tool and MLXRandom.seed() in python
    • maybe set the temperature to 0
    • generate a small number of tokens
    • are they the same? the code to produce tokens from the logits might be slightly different between the two but I found the first token is usually the same with the same seed
  • assuming the tokens are different compare the execution of the models

    • I found something like print("\(name) \(array.shape) \(array.sum())") in both swift and python (similar code in python) can help spot differences without looking at the whole tensor
    • I had typos in the Attention layer a couple times -- incorrectly place parenthesis, etc.
  • make sure your weights are loaded correctly

    • I noticed that you turned off verification of the arrays: try model.update(parameters: parameters, verify: [.none])
    • turn that on -- at the ver least your Attention layer has incorrect keys

Good luck and ask if you have questions!

davidkoski avatar Mar 06 '24 04:03 davidkoski

@davidkoski thank you so much for the detailed breakdown of what went on here, this made a ton of sense!

The other takeaway for me here is that we need to improve debuggability + implement sanity checks, and also probably expose verification as a visible parameter that can be checked on/off. I'm going to think on a bit and add some UI for it – let me know if anything comes to mind here!

vmanot avatar Mar 07 '24 06:03 vmanot