Awni Hannun

Results 1014 comments of Awni Hannun

The memory needed for long prompts scales with the square of the prompt length. So in your case: `3500 * 3500 * num_heads * 2` would be the memory used...

Could you share what you ran that crashed?

@DePasqualeOrg thanks for the addition. My recommendation is that we move this to `mlx-swift-examples` as a layer similar to how it is in MLX LM. It's still a fairly recent...

Interestingly we have support for that model in Python but it looks like it is just [ignores the rope scaling](https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/phi3.py#L36-L40). Somehow it still gives pretty good results for short generations...

> I am not sure what is intended with the array -- I guess they are supposed to correspond to the hidden layer counts? There are also two sets of...

Probably your best bet for this is to use Metal custom kernels which will be added in https://github.com/ml-explore/mlx-swift/pull/137. We also have custom function (which we will plan to add to...

> What do you mean by background app? Making a web service, e.g. some kind of LLM service? Good question. Its been mentioned in two contexts: - Training in the...

> My vote would be on not exposing the strides at all in Swift. +1, it is a foot gun. It's not really intended to be used in the C++...

Sounds great to me. What do you think about these case names maybe slightly more clear: ```swift enum AccessMethod { case copy case noCopyIfContiguous case noCopy } ```