localGPT
localGPT copied to clipboard
Documentation for building and installing project dependencies.
@PromtEngineer @LeafmanZ
We need to establish detailed guides for setting up the prerequisites and building the project.
I've previously developed a similar setup with my bootstrap repository, but we need to adapt it specifically for debian and arch distributions, and align it with the unique dependencies of this project.
Given the limitations of certain dependencies on Windows, we should guide Windows users to set up Ubuntu on WSL and use anaconda. This approach will help bypass some of the constraints we're currently facing.
As of now, the libraries that seem to circumvent most of our issues are ggml and llama.cpp. These libraries are advantageous as they are not hardware-specific and do not have any dependencies, despite the authors' focus on the M2.
However, projects like auto_gptq that rely on bitsandbytes as a dependency may pose challenges. This is due to the design choices of bitsandbytes, which is tailored to perform quantized 8-bit operations on Nvidia-specific GPUs.
I'm eager to hear any thoughts on this matter.
Related issues: #167
@teleprint-me I agree we need to do a much better job at documentation and provide guides to the users for different platforms.
I agree, we can provide a guide for Windows users to use WSL and anaconda. This will make maintenance easier for us as well.
I haven't had a chance to look deeper into ggml and llama.cpp yet, its on my TODO list. A few features that I would like to keep are:
- The ability of the users to simply provide model_id from huggingface and the scripts downloads all the required models. We don't want users to download individual models.
- The ability to run quantized models ( I think
llama.cppcan handle theGPTQformat???). - Ability to use
langchain. - I think we can use
ggmlfor users who are running models on cpu.
We also need to consider performance that we will get out of the hardware when we make these choices. We will probably have to benchmark these packages somehow with different models.
I don't have any issues with replacing auto_gptq with another package that will be easier to maintain.
LMK what you guys think.
@PromtEngineer
I agree with your points and I have some thoughts on them:
-
Model Downloading: I also considered the idea of users providing a
model_idfrom Hugging Face, and the script handling the model download. I held back on implementing it due to other libraries that automate dependency management. However, I agree that usinghuggingface_hub.hf_hub_downloadcould be beneficial. We'll need to design a solution that integrates this functionality smoothly. -
Quantized Models: As per arXiv:2110.02861v2, GPTQ is an 8-Bit Quantization method for models. The
llama.cppREADME states it supports 8-Bit models, but I haven't delved into the source code yet. As I'm currently learning C++ and refreshing my C knowledge, it might take some time for me to fully understand the implementation. -
Langchain: Absolutely, Langchain is a crucial component of our project, and we'll continue to rely on it.
-
GGML for CPU Users: GGML's support for CPU, CUDA, Metal, and OpenCL is indeed promising. I'm particularly interested in the progress made with OpenCL by @Codes4Fun. In the future, I plan to explore ROCm support, which I've already started looking into.
-
Replacing auto_gptq: I'm open to keeping auto_gptq, with the caveat that we clearly state it's specifically designed for Nvidia GPUs. We'll need to adjust our options to accommodate this.
Looking forward to hearing any thoughts on these points.
@teleprint-me I agree with your points. These are the same things I would like to add.
For the time being, I would like to merge #180 when you are able to fix some of the issues I highlighted here .
I think @LeafmanZ's API and GUI code is not fully ported yet, right? Would love to get this update out.
After that, we can create a roadmap to start making additional changes and start adding more support as mentioned above.
I think documentation will play a key role here. Let me know what you guys think.
I am awaiting this update to be pushed out, and then I will rebuild the API and UI around the new architecture. Should take < 24 hrs to get it fixed up, once we have the new build pushed.
@PromtEngineer @LeafmanZ @DeutscheGabanna @shiliu2023
I've added documentation and a docs path to the repository in the dev branch.
They're not as detailed as they will be (I'm leaving all the other details up to the community), but should be enough to get everyone up and running.
This should streamline documentation contributions as well.
For anyone that's interested, you can read it here.
It's using mkdocs with the materia theme. There are 3 themes. materia is the most appealing IMHO. There are community created themes, but I wanted to leave it up to others to decide.
You'll want to take a look at the documentation for more information: https://www.mkdocs.org/getting-started/