David-AU-github
David-AU-github
IS this in the main branch? avail? This can solve an issue merging 7b and 13b models. As well as 10.7/11b and 13b models too. And issues with 20B /...
Thank you ; I will try this out - got the two files and will give it a go. My case is the opposite -> Expanding a model. First ->...
Try to quantize with flag: --leave-output-tensor For iq3xs ... may help? This flag will raise file size slightly, but keep output tensors at original fp16/fp32 regardless of imat or reg...
I have noticed the same issue - > conducted tests as follows: (via LmStudio) 1 - 6 Long form output (one prompt, no regen, one shot) -> GPU (Cuda/Nvidia) 2...
Here is the prompt and method to reproduce the results. For clarity GPU only and CPU only. (I can also create a PDF with the results too, as per test...
my two cents here: "With the patch" -> Word choice is more nuanced, and precise. Sentence structure is also somewhat better. It is definitely higher quality. That being said "general...
> Regarding the patch, on further thought, the computation is correct even without it since we handle the "leftover" elements in the last non-64 block: > > https://github.com/ggerganov/llama.cpp/blob/8cc91dc63c0df397d644a581b2cbeea74eb51ae0/ggml.c#L1537-L1541 > >...
Thank you for all you do. And thank you for the reply too. FYI: Finally got llama.cpp installed on my windows machine. Put the details and fixes in another ticket...
For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH
Windows 11 / Nvidia 4060ti 16 GB - same issue -> getting GPU to work. NOTE The following was done prior: - visual studio community must be installed (2022) With...
> If DARE-Ties gives dramatically different results each time, maybe I don't understand it correctly, but that sounds less like a good thing and more like a bad thing. This...