Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

GPU Usage dropping before completion ends

How long is that prompt? Do you mind copying it here in text form so I can try it directly?

GPU Usage dropping before completion ends

Can confirm.. it's really slow on the longer prompt.

GPU Usage dropping before completion ends

@jeanromainroy if it's possible, can you try rebooting your machine. That seems to resolve the speed issue on my end. I can generate quite quickly with the prompt you provided.

GPU Usage dropping before completion ends

Ok let me see if I can reproduce the bad state. Just starting and killing the flask server is enough to make it slow down? That's pretty wild.

GPU Usage dropping before completion ends

I ran the server / flask app you posted, then ctrl+c it. Then run the model regularly and it is the same speed (generating reasonably fast, e.g. about 7.5 tps)....

GPU Usage dropping before completion ends

Thanks for the data point. Still looking into a better solution for that.

GPU Usage dropping before completion ends

> With MLX it seems to use all of the GPU, but when it starts generating the output, it drops significantly and seems to rely on CPU. What do you...

GPU Usage dropping before completion ends

Did you try setting the sysctl `sudo sysctl iogpu.disable_wired_collector=1`? That usually helps.

GPU Usage dropping before completion ends

I think it's only available on 15.0. Did it improve the token generation speed / GPU utilization?

Added code for Full fine tune

This is cool, and I think it would be nice to support. We might be able to do it with a far smaller diff however. Something like: - Have a...