Alex Cheema

Results 418 comments of Alex Cheema

I assigned you both @komikat @AReid987 you both will receive the bounty for any meaningful work towards this - feel free to work independently or together, up to you,

> hi @AlexCheema, llama.cpp [seems](https://github.com/ggerganov/llama.cpp/discussions/6404) to natively support sharding using gguf-split, could we just use that to shard the downloaded gguf and run it on connected nodes? I also feel...

> I'm not sure if there is a way to run .gguf files on pytorch. Huggingface can be done but would have to be dequantised. Since there already is a...

> Hi @AlexCheema, > > I’d love to work on adding support for Llama 3.2 1B in tinygrad. > > Thanks! Sanchay Go for it!

> Hello. Did the YAML that I did send you via discord is good for you ? Thanks in advance. Best Regards. Benjamin. It looks a bit overcomplicated. Basically you...

I'm concerned with increasing the timeout this much. If a request would take this long, I'd say it should be treated differently. Request handling generally needs to be reworked with...

> Hi I would like to work on this Assigned. Good luck - pls tag me here or on Discord if you have any questions or run into bugs!

Increased bounty to 500 USD as this appears to be harder than anticipated.

No activity for a month. Opening this back up.

> Hey sorry was busy with college > > While working on this I found out that the engines had a race condition within the inference engines. When I tried...