thepipe icon indicating copy to clipboard operation
thepipe copied to clipboard

Extract clean markdown from PDFs, URLs, Word docs, slides, videos, and more, ready for any LLM. ⚡

Results 23 thepipe issues
Sort by recently updated
recently updated
newest added

If Tesseract OCR is not installed correctly, image extraction with text_only=True will yield `tesseract is not installed or it's not in your PATH. See README file for more information.`. This...

bug

I'm running thepipe locally to extract some page URLs for processing with GPT4o, and it seems that the image generated for each page only captures the content above the fold...

I was wondering whether it is possible to extract all images from a document and reference them at their position in the generated markdown? As I understand the documentation it...

The result part of the app seems to be having a KeyError issue. I've been processing the same files and just started to have this issue today.

When processing a .PPT file through the pipe (I'm using a local installation), if the .PPT file has a transparent image, the following error gets thrown: `error":"/usr/local/lib/python3.11/site-packages/PIL/Image.py:1056: UserWarning: Palette images...

This builds on the globbing branch PR I submitted earlier at: https://github.com/emcf/thepipe/pull/26 it replaces pytube and broadly expands the amount of websites supported for automatically scraping videos. I also attach...

This is a version which adds globbing based file filtering within the directories you scrape. The changes in this are already included in the forthcoming yt-dlp pull request as well....

running on the python interpreter i don;t see any output? is the converted file saved somewhere? why does the readme not mention it?

I am trying to perform extraction on a pdf file. I am able to scrape the file using the tool but when trying to extract the information I am getting...