GenerativeAIExamples
GenerativeAIExamples copied to clipboard
Knowledge Graph RAG: fix setup
First of all, thank you for sharing these projects with the public! I tried running your knowledge graph RAG on a fresh WSL install of Ubuntu 22.04.4 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64). In this PR, I documented the changes I had to make in order to run the code. I am unsure if all of these steps are necessarily correct in all cases, but nevertheless I hope that these improvements in the README and code make life easier for the next person trying out this project.
Please let me know how these fixes can be adapted so that they are able to be merged into the repository :)
External Dependencies
sudo apt install poppler-utils ffmpeg libsm6 libxext6 tesseract-ocr libtesseract-dev
I had to install some external packages that are being called by python packages.
-
poppler-utils
is necessary to be able to read PDF files. python packages such aspdf2image
use it. -
ffmpeg libsm6 libxext6
are common cv2 dependencies. Before installing them, I was confronted with aImportError: libGL.so.1: cannot open shared object file: No such file or directory
when trying to process files into the system. This list of packages work, but there might be a more compact way to provide the necessary libraries. -
tesseract-ocr libtesseract-dev
are necessary forpytesseract
. This is used to parse a PDF into a string.
Changes in requirements.txt
Requests==2.31.0
The requirements.txt
file specifies Requests==2.32.3
, but another specified dependency requires 2.31.0, making pip unable to resolve the dependency issue.
pymilvus[model]==2.4.3
Without the model feature, preprocessing of files eventually runs into a runtime exception.
Changes in the code
import nltk
nltk.download('averaged_perceptron_tagger')
I added this snippet at the top of the code to simply make sure the necessary files are there when needed. I am very sure there is a much more suitable spot for this. Please let me know!
from utils.preprocessor import extract_triples
Running the project as described in the README led to issues with the import statements. They were fixed for me by using the full path of the module structure.