GenerativeAIExamples icon indicating copy to clipboard operation
GenerativeAIExamples copied to clipboard

Knowledge Graph RAG: fix setup

Open gschup opened this issue 7 months ago • 0 comments

First of all, thank you for sharing these projects with the public! I tried running your knowledge graph RAG on a fresh WSL install of Ubuntu 22.04.4 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64). In this PR, I documented the changes I had to make in order to run the code. I am unsure if all of these steps are necessarily correct in all cases, but nevertheless I hope that these improvements in the README and code make life easier for the next person trying out this project.

Please let me know how these fixes can be adapted so that they are able to be merged into the repository :)

External Dependencies

sudo apt install poppler-utils ffmpeg libsm6 libxext6 tesseract-ocr libtesseract-dev

I had to install some external packages that are being called by python packages.

  • poppler-utils is necessary to be able to read PDF files. python packages such as pdf2image use it.
  • ffmpeg libsm6 libxext6 are common cv2 dependencies. Before installing them, I was confronted with a ImportError: libGL.so.1: cannot open shared object file: No such file or directory when trying to process files into the system. This list of packages work, but there might be a more compact way to provide the necessary libraries.
  • tesseract-ocr libtesseract-dev are necessary for pytesseract. This is used to parse a PDF into a string.

Changes in requirements.txt

Requests==2.31.0

The requirements.txt file specifies Requests==2.32.3, but another specified dependency requires 2.31.0, making pip unable to resolve the dependency issue.

pymilvus[model]==2.4.3

Without the model feature, preprocessing of files eventually runs into a runtime exception.

Changes in the code

import nltk
nltk.download('averaged_perceptron_tagger')

I added this snippet at the top of the code to simply make sure the necessary files are there when needed. I am very sure there is a much more suitable spot for this. Please let me know!

from utils.preprocessor import extract_triples

Running the project as described in the README led to issues with the import statements. They were fixed for me by using the full path of the module structure.

gschup avatar Jul 12 '24 07:07 gschup