chatgpt-retrieval icon indicating copy to clipboard operation
chatgpt-retrieval copied to clipboard

Use .env files and import pypdf

Open jasuca opened this issue 1 year ago • 2 comments

Added pypdf as requirement to read pdf Created a requirements.txt file Use .env to avoid commit OpenAI keys

jasuca avatar Jun 29 '23 14:06 jasuca

  • Is pypdf necessary, or does unstructured include that in its dependencies?
  • The dependencies in requirements.txt (wrapt, ipython, pythondotenv) don't seem to match the pip install command...?
  • typo on "You can also use" to "You can use"

techleadhd avatar Jun 29 '23 21:06 techleadhd

  • pypdf is necessary. Without manually using was not reading pdf correctly.
  • Sorry, I updated the command. iphython is not necessary. wrapt is necessary to enforce the 1.15. The 1.14 doesn't support the latest python version. pythondotenv is for the .env support
  • Typo fixed

jasuca avatar Jul 01 '23 19:07 jasuca

I like the idea adding the missing requirements.txt to the project, many of us misses it.

StrictLine avatar Jul 12 '23 22:07 StrictLine

My comments here are similar to https://github.com/techleadhd/chatgpt-retrieval/pull/31.

This might be a matter of preference, but to keep things simple, I'm favoring keeping the api key in the code, since setting up env vars is one more thing to learn for new folks. Not using the env vars also makes the code more compatible with web apps, which may not be running under the user's environment. Let me know if there is a strong reason otherwise.

techleadhd avatar Aug 19 '23 20:08 techleadhd

Should we create another PR for the missing dependencies?

StrictLine avatar Aug 20 '23 10:08 StrictLine

i was able to run this without pythondotenve or wrapt... are you sure those are needed...?

techleadhd avatar Aug 20 '23 17:08 techleadhd

No, for me pypdf was necessary, but I'd need to re-test it.

StrictLine avatar Aug 21 '23 10:08 StrictLine

No, for me pypdf was necessary, but I'd need to re-test it.

✗ python chatgpt.py "what is my cat's name"
Using embedded DuckDB without persistence: data will be transient
I'm sorry, but I don't know what your cat's name is.

This is the result after a fresh cloning from GitHub and using venv with the installation command in the README: pip install langchain openai chromadb tiktoken unstructured

But my assumption was wrong, installing the pypdf does not solve the problem. It's related to the DirectoryLoader, which "uses the UnstructuredLoader" under the hood, but directly using this works like a charm. So the underlying implementation has changed without updating the documentation.

StrictLine avatar Aug 23 '23 20:08 StrictLine