Fred Bliss
Fred Bliss
No luck getting it to generate anything meaningful - just empty data responses. Probably need to spend more time with it than I did.
nice! able to share the lora or prompts by chance?
Which model (or model+lora) are you using for this?
@MichaelMartinez I was about to start working on my own lightweight integration for langchain and other tooling in the ecosystem - happy to work together on something if there's interest...
Good idea. You can make this work at a basic level by chaining together search and PAL. See example below: # Get the cost of an EC2 instance per hour...
updated repo here: https://github.com/yangkevin2/doc-story-generation
i'm happy to help with this one as well. vicuna's been by far the most promising local model i've tested to date.
@BillSchumacher been watching your work in autogpt, you must know this stuff inside and out by now. excited to try it out later tonight - thanks!
Yeah, cpu inference is nowhere near the experience you can get with gpu, even at 4bit
> There is some ongoing work to use GPTQ to compress the models to 3 or 4 bits in this [repo](https://github.com/qwopqwop200/GPTQ-for-LLaMa). Also a discussion going on over at the oobabooga...