Forkoz

Results 474 comments of Forkoz

GGUF models are only faster for me with nvlink patch and fully offloaded. Q5 going to need some CPU unless you have 3 cards and that's gonna be slow.

70b doesn't fit on 1 4090 so half (or more) of it is on CPU.

The readme was pretty good.. in terms of hardware you're on your own. Can use small models on 1 gpu or big models. Am testing it with 103-120b @ 16k...

Stuff like new forge and comfyui do. Also, they only use the GGUF format.