llama_cpp.rb Improving the quality of the results

Hi,

I'm using this with langchainrb(https://github.com/andreibondarev/langchainrb).

The purpose is to chain user added text as a "context" and create a response based on the "prompt.

First, I store user supplied "extra documents" and "text files" on a VectorStore(locally running Chroma db) after chopping them with

    splitter = Baran::CharacterTextSplitter.new(chunk_size: 500,
                                                chunk_overlap: 50,
                                                separator: "")

I implemented my own LangchainLlamaCpp class method on "langchainrb" gem to use this "Llamacpp.rb" gem as a new LLM to run things locally to avoid any online GPT APIs and run this completely offline.
In that new class, I required this gem "llama_cpp.rb" and then use

 params = LLaMACpp::ContextParams.new
params.n_ctx = DEFAULTS[:n_ctx]
client = LLaMACpp::Context.new(model_path: "model.bin",
                                      params: params)
LLaMACpp.generate(@client, prompt, n_threads: 4)"

After that, langchainrb literally chains the query to VectorStore(chromadb) and then to LLM(LlamaCpp.rb).
While I get partially correct answers related to the context I provided with the files, the answer quality is not that great compared to this python code: https://github.com/imartinez/privateGPT that I run. That one is very slow though so I avoid it. Ruby code I have is very fast like 15-30 seconds vs that python code that takes like 200-300 seconds.
I also used "sentence-transformers" with system calls to call a python code using back ticks "" from ruby to create embeddings with "MiniLM-L6-v2" as I couldn't figure out how I can work with embeddings and sentence transformers in ruby. There are no ruby gems that do what "sentence-transformers do in Python

Long story short, I literally implemented the code from the privateGPT(python)( https://github.com/imartinez/privateGPT) in ruby by following the exact structure and tried to copy everything that it does in ruby. It worked but with lower quality. I'm using it with the vicuna 13b model.

I'm not sure which setting to customize to get higher quality answers. Some of my answers include [ } @ ' " ` special characters too which doesn't make sense in a readable sentence. I'm thinking if I'm not escaping something right. That doesn't happen with the python code I mentioned above.

Do I need to monkey patch the "generate" method to change some values there(temperature etc.)?

Jun 06 '23 18:06 mldev94

Hi @mldev94, I randomly found this thread.

I implemented my own LangchainLlamaCpp class method on "langchainrb" gem to use this "Llamacpp.rb" gem as a new LLM to run things locally to avoid any online GPT APIs and run this completely offline.

If you think your LLM class is a good candidate to add to Langchain.rb, please feel free to open up a PR in our repo. I would be happy to help out and provide any feedback!

I also used "sentence-transformers" with system calls to call a python code using back ticks "" from ruby to create embeddings with "MiniLM-L6-v2" as I couldn't figure out how I can work with embeddings and sentence transformers in ruby. There are no ruby gems that do what "sentence-transformers do in Python

Have you looked at our Hugging Face LLM provider embeddings method? It uses the MiniLM-L6-v2 model (by default) to generate embedding; should be generating the same embeddings as the Python lib. This is the model off of HF: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

Regarding validating your answers it's a bit more complicated as you'd have to provide more info on the type of data you're indexing, etc.

Jun 11 '23 03:06 andreibondarev

@mldev94 We added support for LlamaCpp (using this gem) to Langchain.rb: https://github.com/andreibondarev/langchainrb/blob/main/lib/langchain/llm/llama_cpp.rb

Jul 06 '23 17:07 andreibondarev