AmineDiro

Results 12 comments of AmineDiro

Hello, I recently saw ggerganov PR https://github.com/ggerganov/llama.cpp/pull/3228 where he implemented parallel decoding for multiple sequences. Is there any plan on supporting this feature ? This would basically provide a mechanism...

> I don't think I want to reintroduce it again - especially as we should never need to do this again - but I would like to support 70B at...

Hi @pixelspark , Thanks a lot for the detailed explanation. I'll start working on this, that's a good excuse to start learning `wgpu` 😃

Hi there @pixelspark, I started working on Matmul broadcasting this weekend and I encountered an issue with the existing implementation. I am new `wgpu` but I have some Cuda programming...

Thanks for your response ! > Hm, these are always tough issues to figure out... First of all, you should establish that this is not actually some rounding error (differences...

Hi @pixelspark , No worries, thanks for taking the time to review this PR! I didn't know where to put the function I'll do the necessary modification and push it!...

Hello @ncclementi , I built [daskqueue](https://github.com/AmineDiro/daskqueue), a lightweight Distributed Task Queue library built on top of Dask. Daskqueue also implements persistent queues for holding tasks on disk and surviving Dask...

Run on local dump of ~5M vectors: ![image](https://github.com/user-attachments/assets/42a2a7e9-b276-4b66-9273-e2f209993e07)

@FL33TW00D Did the original work. Here is the model impl for 14t/s I am still having generation issues phi1.py ```python from abc import ABC, abstractmethod import torch import ttnn from...

Hi, thanks for the very detailed response! I'll take a look at the links 🙂