Awni Hannun
Awni Hannun
The memory needed for long prompts scales with the square of the prompt length. So in your case: `3500 * 3500 * num_heads * 2` would be the memory used...
Could you share what you ran that crashed?
@DePasqualeOrg thanks for the addition. My recommendation is that we move this to `mlx-swift-examples` as a layer similar to how it is in MLX LM. It's still a fairly recent...
Interestingly we have support for that model in Python but it looks like it is just [ignores the rope scaling](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/phi3.py#L36-L40). Somehow it still gives pretty good results for short generations...
> I am not sure what is intended with the array -- I guess they are supposed to correspond to the hidden layer counts? There are also two sets of...
This should be working as of #107
Probably your best bet for this is to use Metal custom kernels which will be added in https://github.com/ml-explore/mlx-swift/pull/137. We also have custom function (which we will plan to add to...
> What do you mean by background app? Making a web service, e.g. some kind of LLM service? Good question. Its been mentioned in two contexts: - Training in the...
> My vote would be on not exposing the strides at all in Swift. +1, it is a foot gun. It's not really intended to be used in the C++...
Sounds great to me. What do you think about these case names maybe slightly more clear: ```swift enum AccessMethod { case copy case noCopyIfContiguous case noCopy } ```