haystack icon indicating copy to clipboard operation
haystack copied to clipboard

PDF2TextConverter is difficult to use on Windows

Open AIAnytime opened this issue 1 year ago • 5 comments

Describe the bug PDF2TextConverter is a big pain when using Haystack on the Windows Machine.

Error message Error that was thrown (if available)

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce Steps to reproduce the behavior

FAQ Check

System:

  • OS:
  • GPU/CPU:
  • Haystack version (commit or version number):
  • DocumentStore:
  • Reader:
  • Retriever:

AIAnytime avatar Sep 19 '23 19:09 AIAnytime

Hey @AIAnytime! The description of the issue doesn't really match your title. I'm going to edit it to correspond. If you're reporting an actual issue, could you be more specific about your problem? This way we can help. If instead you want to have rather a general discussion about the shortcomings of Haystack vs Langchain or the PDF conversion capabilities, we normally discuss that on Discord or in the Github Discussions.

ZanSara avatar Sep 20 '23 09:09 ZanSara

XPDF is a big hurdle to work with when it comes to Window..... Do you have anything on the roadmap of using classes like pypdf, pypdf2, etc?

AIAnytime avatar Sep 20 '23 18:09 AIAnytime

pypdf is being added to the upcoming release 2.0 right now: https://github.com/deepset-ai/haystack/pull/5850

masci avatar Sep 21 '23 06:09 masci

Superb. When will we have the updated version released?

AIAnytime avatar Sep 21 '23 20:09 AIAnytime

It's already available as part of the 2.0 preview package. Unfortunately we're still lacking proper documentation on this front (and we're working on in). To get it, you can either:

  1. Install farm-haystack from main, OR
  2. Do pip install haystack-ai (it's released as often as new components get added).

In the second case you will only get the content of the preview package, which right now is quite unstable. To know more about the migration have a look at this Discussion https://github.com/deepset-ai/haystack/discussions/5568: as the documentation becomes available we'll notify the community about it and it will get easier to use :blush:

ZanSara avatar Sep 25 '23 12:09 ZanSara