AmineDiro
AmineDiro
Hello, I recently saw ggerganov PR https://github.com/ggerganov/llama.cpp/pull/3228 where he implemented parallel decoding for multiple sequences. Is there any plan on supporting this feature ? This would basically provide a mechanism...
> I don't think I want to reintroduce it again - especially as we should never need to do this again - but I would like to support 70B at...
Hi @pixelspark , Thanks a lot for the detailed explanation. I'll start working on this, that's a good excuse to start learning `wgpu` 😃
Hi there @pixelspark, I started working on Matmul broadcasting this weekend and I encountered an issue with the existing implementation. I am new `wgpu` but I have some Cuda programming...
Thanks for your response ! > Hm, these are always tough issues to figure out... First of all, you should establish that this is not actually some rounding error (differences...
Hi @pixelspark , No worries, thanks for taking the time to review this PR! I didn't know where to put the function I'll do the necessary modification and push it!...
Hello @ncclementi , I built [daskqueue](https://github.com/AmineDiro/daskqueue), a lightweight Distributed Task Queue library built on top of Dask. Daskqueue also implements persistent queues for holding tasks on disk and surviving Dask...
Run on local dump of ~5M vectors: 
@FL33TW00D Did the original work. Here is the model impl for 14t/s I am still having generation issues phi1.py ```python from abc import ABC, abstractmethod import torch import ttnn from...
Hi, thanks for the very detailed response! I'll take a look at the links 🙂