David Koski
David Koski
Yes, there are two parts to it. The first is you need to manage the stream of messages: - https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLMCommon/UserInput.swift#L18 These are typically going to have a role & content...
Sure, you can pass it here: - https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLMCommon/Evaluate.swift#L288 instead of letting it default to a new one each time. If this API proves inadequate then a PR to fix it...
```python # Make the initial prompt cache for the model prompt_cache = make_prompt_cache(model) ``` is the same as: ```swift let cache = model.newCache(parameters: parameters) ``` and this call: ```python response...
You would want to keep the cache between calls to generate/TokenIterator -- that is the context state that represents past tokens. @awni can you point at how the context cache...
There are now some chat examples and use of KVCache as well. See also #310 and #312
This brings up an interesting point, I think: the Chat app doesn't have a sandbox on macOS. I think it probably should, though storing files in ~/Downloads is convenient. On...
The Swift API consistently (I think) omits the shape label everywhere when shape is required. In general python allows labels for everything but can omit them if they are not...
See #308 and #311
Is the suggestion that the sampler change when entering the `` token? You use a standard sampler most of the time but a special tool sampler inside?
You can put instructions in the prompt to generate JSON and give examples on how you want it formatted. That said, the model may or may not actually generate JSON....