David Koski
David Koski
This means that there is a key `base_model` in the weights and there is no such key in the swift code. I am not sure what this key is for...
`safetensors` is the native format for MLX, so that is fine. I wonder if PEFT produces the same structure of weights for the adaptors? @awni do you know about this?...
First the easy part: ```swift Stream.gpu.synchronize() ``` is the call to wait for GPU activity to be done.
For call `1`, it could potentially observe task cancellation, see #227 , but one of the calls, `eval(model)` can potentially take several seconds. Perhaps it could iterate over the parameters...
Yes, I am not sure if the requester for #227 intended to submit a PR, but as-is it doesn't support cancellation. The notes in that issue should help if you...
@MilanNosal I think what you did looks reasonable, though we would have to test it to make sure. `eval` on a single `MLXArray` will synchronously evaluate the graph that produces...
I am not sure either -- that may be the time to first token cost. It requires evaluating the entire graph (the model) for a token. It may require JIT...
It is possible via: - https://swiftpackageindex.com/ml-explore/mlx-swift/main/documentation/mlx/seterrorhandler(_:data:dtor:) You would probably set a global variable indicating that an error occurred and then check that. If the prompt is long you could potentially...
that controls what size chunks it feeds the prompt in -- if using a smaller prefill size does it then that is a great way I think (and is the...
The latest tag of mlx-swift should support the boolean masks. We have another on coming soonish -- maybe I will wait and rev it to the latest if it happens...