Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

LoRA on all linear transformer block layers

I think a more sustainable way to do this is the following: - Have a field in the Yaml which gives the layers keys to apply LoRA to - Make...

LoRA on all linear transformer block layers

> I'd also like to add the ability to configure LoRA alpha and rank via config parameters as well. Should that be broken out into a separate PR? Great! I...

Ability to convert a lora_fused_model to gguf format for use in LMStudio and others

Just to be clear the error is when you run the `convert.py` to get the GGUF? Where is that script? It does look like a llama.cpp issue to me. Are...

Ability to convert a lora_fused_model to gguf format for use in LMStudio and others

Nice idea. We can look into something like that. It may actually not be too bad since we can already export to GGUF via MLX. It's mostly a matter of...

Ability to convert a lora_fused_model to gguf format for use in LMStudio and others

Just the title I suppose. I will add the appropriate label.

Ability to convert a lora_fused_model to gguf format for use in LMStudio and others

Thanks, that'd be great!

Predict stop sequence matches during streaming

Oh much better, thank you 😄

Predict stop sequence matches during streaming

What about adding your test (modified for unittest) as a test case to a new test file `test_server.py` in the tests directory: https://github.com/ml-explore/mlx-examples/tree/main/llms/tests ?

Predict stop sequence matches during streaming

Hi! Is this ready to be reviewed again?

Distributed Processing in any way?

Could you say more about what you are looking for? Distributed is a pretty generic term. What exactly would you like to distribute? Training / inference? At what granularity? Any...