llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Chat with local files (Retrieval Augmented Generation?)

Open cleesmith opened this issue 1 year ago • 5 comments

This is a cool project and works on my old Mac (Intel) and a newer Air (M1). Well, it's slow of course due to my hardware, but so easy to get installed/running compared to all of the other ones I have tried ... thanks for that.

There is a similar project called BionicGPT (https://github.com/bionic-gpt/bionic-gpt) which is Docker based, a bit more work (not too much) to get installed/running, but it allows the upload or ingest of local documents which can be chatted/summarized/searched. Personally, I would be happy to just use plain text files; no need for .pdf, .docx, .html, etc.

Is this possible in the future for llamafile ?

cleesmith avatar Dec 16 '23 17:12 cleesmith

Very nice idea, I had something very similar in mind: #137

do-me avatar Dec 23 '23 12:12 do-me

llamafile provides the lower level fundamentals that are needed for file/url summarization. Please check out a blog post I recently wrote, which explains how to do this on the command line: https://justine.lol/oneliners/ For example, if you wanted to summarize a text file named ESSAY.TXT on your computer using the Mistral main llamafile, you could say:

(
  echo '[INST]Summarize the following text:'
  cat ESSAY.TXT
  echo '[/INST]'
) | ./mistral-7b-instruct-v0.1-Q4_K_M-main.llamafile \
      -c 6700 \
      -f /dev/stdin \
      --temp 0 \
      -n 500 \
      --silent-prompt 2>/dev/null

@stlhood I'll leave this open for you to comment further. We could for instance add an upload feature to the web gui.

jart avatar Dec 28 '23 00:12 jart

This would be very useful.

Another alternative is something similar to what is being achieved at https://github.com/imartinez/privateGPT or https://github.com/PromtEngineer/localGPT

The ability to ingest files and then chat with them via the model. Or maybe point to a folder where all the files are stored.

francisco-lafe avatar Dec 29 '23 13:12 francisco-lafe

I'm not familiar with the trick they use prompt/tune/lora/etc. that behavior, but it'd potentially be a fun feature to explore. I'd be interested in hearing what @stlhood thinks.

jart avatar Dec 29 '23 15:12 jart

Also, maybe the popular AI chat thingies are already doing this, and I lack prompting skills, but it would be nice to have a "semantic search" (meaning) for local documents ... similar to: https://github.com/freedmand/semantra.
I suppose having an all-in-one-search solution is off in the future, and being able to flip between searches like: keyword, pattern, and semantic would be ideal. Summarization is nice and useful, but often a bit more is needed to deep dive. The following was helpful, thank you:

llamafile provides the lower level fundamentals that are needed for file/url summarization. Please check out a blog post I recently wrote, which explains how to do this on the command line: https://justine.lol/oneliners/ For example, if you wanted to summarize a text file named ESSAY.TXT on your computer using the Mistral main llamafile, you could say:

(
  echo '[INST]Summarize the following text:'
  cat ESSAY.TXT
  echo '[/INST]'
) | ./mistral-7b-instruct-v0.1-Q4_K_M-main.llamafile \
      -c 6700 \
      -f /dev/stdin \
      --temp 0 \
      -n 500 \
      --silent-prompt 2>/dev/null

@stlhood I'll leave this open for you to comment further. We could for instance add an upload feature to the web gui.

cleesmith avatar Dec 31 '23 16:12 cleesmith

RAG is a powerful use for LLMs, no doubt about it, but we are not currently considering adding RAG to llamafile. Our reasoning follows:

Adding RAG functionality to llamafile would rather dramatically change the scope and intent of the project. Doing RAG well is not straightforward, as evidenced by the many excellent open source projects (like privateGPT) and startups (like LlamaIndex) that have moved into this space over the last few months. Instead of trying to cram some sort of minimal/naive RAG implementation into llamafile, we'd rather see llamafile integrated into existing RAG projects and products. This way, llamafile could help empower these projects to leverage local inference with open models (instead of relying on OpenAI).

If anyone reading this is involved in any existing RAG projects out there, please feel free to mention llamafile to their maintainer(s) -- we'd love to work with them...

stlhood avatar Jan 02 '24 03:01 stlhood

I think @stlhood is right. There's a lot of people doing great work on RAG and I think the best way we can support them is by keeping llamafile focused on what it can do best, which is making LLM software easily distributable and accessible. I'm going to close this issue now. Thank you for reaching out and asking!

jart avatar Jan 02 '24 08:01 jart