mlx-swift-examples When to stop in the LLMEval?

When to stop in the LLMEval?

Open MatthewWaller opened this issue 4 months ago • 7 comments

In the LLMEval project, the generation stops after reaching a limit on tokens. Is there a way to configure stopping when it finds a special token? I tried to look for the Phi 3's end token but it seems to go off the rails earlier than when <|end|> or <|endoftext|> appear. Thoughts?

Apr 24 '24 22:04 MatthewWaller

It should stop at the end of sentence id: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Evaluate.swift#L199

The fact that it's not stopping likely means it doesn't have the right EOS token ID set. Which model did you try?

Apr 25 '24 02:04 awni

@awni was working with phi34bit

Apr 25 '24 02:04 MatthewWaller

Looks like this is the eos token for that model: https://huggingface.co/mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed/blob/main/tokenizer_config.json#L340. We'll need to check to make sure the IDs match / the tokenizer is reading it correctly.

Apr 25 '24 03:04 awni

Specifically the code is looking for either the unknown token or the eos token:

        if t == tokenizer.unknownTokenId || t == tokenizer.eosTokenId {

https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Evaluate.swift#L199

The didGenerate block that is passed in can also return .stop if you are implementing this yourself.

Apr 25 '24 16:04 davidkoski

Alright, well unknownTokenId is 0 and eosTokenId is 32000, which I believe is correct, and it matches "eos_token": "<|endoftext|>", from HuggingFace. I can see in the debugger that the eosToken is <|endoftext|>. The model just never seems to produce that token. Hmmm. For instance, I can tell phi3 to "Write 3 words" and on HuggingFace chat, it appropriately stops. So I'm guessing it's producing that token for them. It just never shows up in the output I'm getting.

Apr 25 '24 21:04 MatthewWaller

It may be related to this: https://github.com/huggingface/swift-transformers/issues/92 -- we are not passing in a proper prompt and the generation may be impacted.

That issue is a bit terse but basically the extra tokens are not being honored when tokenizing.

Apr 25 '24 21:04 davidkoski

Apr 25 '24 22:04 MatthewWaller

Saw that: https://github.com/huggingface/swift-transformers/issues/92 -- has been closed and special tokens should now be accounted for. I'm still running into issues with the model itself returning the '<|end|>' token when the assistant is done, wondering if anyone has found a more manual solution to getting the correct model (phi-3) response?

Apr 30 '24 15:04 tylerckeller

I made a little project where I directly looked for that token (32001) and returned .stop if I found it, in the LLMEvaluator. Once I did that, and got the correct tokens in preparePrompt, everything worked correctly.

Apr 30 '24 15:04 MatthewWaller

Gotcha, so something similar to:

let result = await MLXLLM.generate(
    promptTokens: promptTokens, parameters: generateParameters, model: model,
    tokenizer: tokenizer
) { tokens in
    // update the output -- this will make the view show the text as it generates
    let endGen = tokens.contains(32001)
    if tokens.count % displayEveryNTokens == 0 {
        let text = tokenizer.decode(tokens: tokens)
        await MainActor.run {
            self.output = text
        }
    }

    if tokens.count >= maxTokens || endGen {
        return .stop
    } else {
        return .more
    }
}

Apr 30 '24 16:04 tylerckeller

Exactly, and heads up that there is a little bug you may run into at the end, below that bit. I had to change it to

// update the text if needed, e.g. we haven't displayed because of displayEveryNTokens
            var validTokens = Array(result.tokens.prefix(while: { $0 != 32001 }))
            validTokens.removeLast()
            let text = tokenizer.decode(tokens: validTokens)
            await MainActor.run {
                if result.output != self.output {
                    self.output = text
                }
                running = false
                self.stat = " Tokens/second: \(String(format: "%.3f", result.tokensPerSecond))"
            }

Because you can still get the <|end|> token and more in there when it does that final bit of output.

Apr 30 '24 16:04 MatthewWaller

Closing now that the main issue has been resolved with transformers.

Apr 30 '24 16:04 MatthewWaller

mlx-swift-examples mlx-swift-examples copied to clipboard

When to stop in the LLMEval?

mlx-swift-examples
mlx-swift-examples copied to clipboard