haystack
haystack copied to clipboard
PDF2TextConverter is difficult to use on Windows
Describe the bug PDF2TextConverter is a big pain when using Haystack on the Windows Machine.
Error message Error that was thrown (if available)
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce Steps to reproduce the behavior
FAQ Check
- [ ] Have you had a look at our new FAQ page?
System:
- OS:
- GPU/CPU:
- Haystack version (commit or version number):
- DocumentStore:
- Reader:
- Retriever:
Hey @AIAnytime! The description of the issue doesn't really match your title. I'm going to edit it to correspond. If you're reporting an actual issue, could you be more specific about your problem? This way we can help. If instead you want to have rather a general discussion about the shortcomings of Haystack vs Langchain or the PDF conversion capabilities, we normally discuss that on Discord or in the Github Discussions.
XPDF is a big hurdle to work with when it comes to Window..... Do you have anything on the roadmap of using classes like pypdf, pypdf2, etc?
pypdf is being added to the upcoming release 2.0 right now: https://github.com/deepset-ai/haystack/pull/5850
Superb. When will we have the updated version released?
It's already available as part of the 2.0 preview package. Unfortunately we're still lacking proper documentation on this front (and we're working on in). To get it, you can either:
- Install
farm-haystack
from main, OR - Do
pip install haystack-ai
(it's released as often as new components get added).
In the second case you will only get the content of the preview
package, which right now is quite unstable. To know more about the migration have a look at this Discussion https://github.com/deepset-ai/haystack/discussions/5568: as the documentation becomes available we'll notify the community about it and it will get easier to use :blush: