Awni Hannun
Awni Hannun
Thanks for the input @bigsnarfdude
@gavi that is a very good point and thanks for the script. One could even decrease the maximum line length to 1024 or 512 for even more memory savings depending...
Oh good, catch! I will add it to an outstanding diff (#219) I have to update that example. Thank you!
Thanks @rfdougherty ! Will do!
That would be great!
> I had a review comment on your PR that was ignored but I think its quite important, any thoughts on that? What was your comment, I couldn't find it?
Yea I think you are correct and it should have that [behavior now](https://github.com/ml-explore/mlx-examples/blob/main/llms/speculative_decoding/decoder.py#L156-L159). Let me know if you still think it's an issue.
> Nice, looks good! I like that you avoided main model sampling and instead just compare draft / main probs directly. Do you have a reference for this code, I...
Hey @LeonEricsson I like this PR a lot but TBH I'm not entirely sure what to do with it. The prompt look up decoding is a bit niche to dedicate...
> I haven't looked through the speculative example thoroughly since the change to T5 but I'll give it a look and try to decide what's most appropriate between 1) and...