thepipe
thepipe copied to clipboard
Extract clean markdown from PDFs, URLs, Word docs, slides, videos, and more, ready for any LLM. ⚡
Thoughts on a scan feature that prints file types of the directory/file selected for Piping without extracting any data? It would be clearer what file types are causing failure if...
Multiple Questions: What are the resources recommend/required for local extraction? When running locally can you provide us the option to expose a port and receive POST requests? That way we...
thepipe https://www.linkedin.com/in/spencer-reitsma-8a3938151/ Extracting from website... Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Python312\Scripts\thepipe.exe\__main__.py", line 7, in File "C:\Users\Spenc\AppData\Roaming\Python\Python312\site-packages\thepipe_api\thepipe.py", line...
Hi! Not sure if this is a bug or a feature, but I'd love to use the `ai_extraction` option to improve the handling of PDF documents. However, enabling this option...
Added plaintext extraction rule for '.ino', '.ini', '.cfg', and '.log' files.
currently ignore only accepts one file type to be ignored
Add .ino functionality for GitHub repos related to arduino
Accepting requests features in this thread, please feel free to suggest! The roadmap so far includes: - Cloud storage extraction (Google Drive, OneDrive) - E-Commerce platform extraction (Amazon) - Markdown...
I was looking at your pipeline and thought you might be better served by using https://github.com/Vaibhavs10/insanely-fast-whisper or allow a bit of wiggle room in your framework to allow an optional...
When you have a video over 60 seconds it splits it into chunks like: [00:55.800 --> 00:59.960] insert text here but for a video which was originally longer than 60...