Mark Schmidt
Mark Schmidt
Th colab demo is meant to run on a free google colab GPU not on a local runtime (and definitely not CPU). If you want to run chatGLM on a...
I get awful results with Flan-UL2. Its responses tend to be extremely short and it hallucinates more than most models when it doesn't know something. I have had no issues...
Here's an example of a question to Flan-UL2 where it is both wrong and characteristically short, even when asked to explain. (Gears 1 and 6 spin in opposite directions, as...
I've been contributing to most of the listed projects daily for a while and would love to help maintain a list like this. Let me know.
Vicuna appears to be trained to use: ``` ### Assistant: Text ### Human: Text ``` Using "### Human:" as a reverse prompt partially works. But instruct mode support could be...
For comparison Alpaca-7B took 3 hours on 3xA100 and LoRA/PEFT reduces compute requirements two orders of magnitude for similar results. So likely only a couple of hours and also likely...
@vgoklani Generally you must merge the 16bit peft into a 16bit model and then quantize the resulting merged model down to 4bit if you want 4bit inference. The quality of...
> > I've tried 7B full fine tune alpaca and a 7b LORA and I find the lora to be greatly lacking But was the LoRA created in 16bit or...
@DataBassGit I see that PR got closed. What's the status of your fork?
GPT4all supports x64 and every architecture llama.cpp supports, which is every architecture (even non-POSIX, and webassemly). Their moto is "Can it run ~Doom~ LLaMA" for a reason. Ooga supports GPT4all...