LeonEricsson

Results 17 comments of LeonEricsson

I have the same issue. I am running in a fresh podman environment.

@awni perhaps we can leave this as T5 and then make an attempt at swapping to Llama in a new PR? I was thinking we could adopt the model format...

> Looks really nice! I think we can get this in soon. I didn't look yet at the core of the prompt decoder but left a few comments. > >...

> I've been poking around your code @LeonEricsson because I have some long summarization tasks that I'd like to speed up, but noticed a significant bottleneck from the loop. This...

@cmcmaster1 finally implemented a pure MLX version that should be comparable in performance to the numpy one. Would be great if you could confirm this on your end. **However**, before...

> Haha awesome I was actually planning this after finishing my speculative decoding but didn't get to it, glad someone else did! :D thanks for laying the groundwork! I had...

You've since changed that part of the code but it was about this line in my`decoder.py` file: `new_tokens = sampled[: max(1, num_to_accept + 1)]` I'm fairly confident the accepted tokens...

> Yea I think you are correct and it should have that [behavior now](https://github.com/ml-explore/mlx-examples/blob/main/llms/speculative_decoding/decoder.py#L156-L159). Let me know if you still think it's an issue. Nice, looks good! I like that...

> Everything there except the inclusion of the `delta` parameter came from the [original paper](https://arxiv.org/abs/2211.17192). See Algorithm 1 and section 2.3. Thanks! > The intention is to normalize them since...