Cloud not convert stream / pdf to markdown
Hey there,
i wanted to generate a markdown of a really long pdf document (roughly around 100 pages). Simple print works, but as soon as it should be converted to markdown, it gives the following issue below. Is there a now limitation to the length of a document?
Traceback (most recent call last):
File "/Users/user/Desktop/Repositories/markitdown/script/markdown.py", line 73, in
Thanks for the report. Let's get to the bottom of this.
What version of the library are you using? Did you install it with [all] or at least [pdf]? Is this a problem with all (e.g., smaller) PDFs? Or just this one? Are you using the python library or the command line?
On my plate is to add a debug option and more python logging, to better support debugging these types of scenarios.
seeing the same. installed markitdown version 0.1.1
using: "pip install -e packages/markitdown[all]" returns: "zsh: no matches found: packages/markitdown[all]" and similarly for [pdf] and other options.
The only install command that didn't fail was this (below), but it leads to something like OP's reported error above when used:
pip install -e packages/markitdown
Obtaining file:///users/name/localpath/somedir/markitdown/packages/markitdown
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Installing backend dependencies ... done
Preparing editable metadata (pyproject.toml) ... done
==================== Traceback:
/opt/anaconda3/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
right=ast.Str(s=sentinel),
/opt/anaconda3/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
right=ast.Str(s=sentinel),
/opt/anaconda3/lib/python3.12/ast.py:587: DeprecationWarning: Attribute s is deprecated and will be removed in Python 3.14; use value instead
return Constant(*args, **kwargs)
/opt/anaconda3/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
right=ast.Str(s=sentinel),
/opt/anaconda3/lib/python3.12/ast.py:587: DeprecationWarning: Attribute s is deprecated and will be removed in Python 3.14; use value instead
return Constant(*args, **kwargs)
/opt/anaconda3/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
right=ast.Str(s=sentinel),
/opt/anaconda3/lib/python3.12/ast.py:587: DeprecationWarning: Attribute s is deprecated and will be removed in Python 3.14; use value instead
return Constant(*args, **kwargs)
/opt/anaconda3/lib/python3.12/site-packages/executing/executing.py:713: DeprecationWarning: ast.Str is deprecated and will be removed in Python 3.14; use ast.Constant instead
right=ast.Str(s=sentinel),
/opt/anaconda3/lib/python3.12/ast.py:587: DeprecationWarning: Attribute s is deprecated and will be removed in Python 3.14; use value instead
return Constant(*args, **kwargs)
---------------------------------------------------------------------------
FileConversionException Traceback (most recent call last)
Cell In[2], line 2
1 md = MarkItDown()
----> 2 result = md.convert('../test_report.pdf')
File ~/some-path-to-here/markitdown/packages/markitdown/src/markitdown/_markitdown.py:273, in MarkItDown.convert(self, source, stream_info, **kwargs)
271 return self.convert_uri(source, stream_info=stream_info, **_kwargs)
272 else:
--> 273 return self.convert_local(source, stream_info=stream_info, **kwargs)
274 # Path object
275 elif isinstance(source, Path):
File ~/some-path-to-here/markitdown/packages/markitdown/src/markitdown/_markitdown.py:327, in MarkItDown.convert_local(self, path, stream_info, file_extension, url, **kwargs)
323 with open(path, "rb") as fh:
324 guesses = self._get_stream_info_guesses(
325 file_stream=fh, base_guess=base_guess
326 )
--> 327 return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
File ~/some-path-to-here/markitdown/packages/markitdown/src/markitdown/_markitdown.py:613, in MarkItDown._convert(self, file_stream, stream_info_guesses, **kwargs)
611 # If we got this far without success, report any exceptions
612 if len(failed_attempts) > 0:
--> 613 raise FileConversionException(attempts=failed_attempts)
615 # Nothing can handle it!
616 raise UnsupportedFormatException(
617 f"Could not convert stream to Markdown. No converter attempted a conversion, suggesting that the filetype is simply not supported."
618 )
FileConversionException: File conversion failed after 1 attempts:
- PdfConverter threw MissingDependencyException with message: PdfConverter recognized the input as a potential .pdf file, but the dependencies needed to read .pdf files have not been installed. To resolve this error, include the optional dependency [pdf] or [all] when installing MarkItDown. For example:
* pip install markitdown[pdf]
* pip install markitdown[all]
* pip install markitdown[pdf, ...]
* etc.
Ignore my previous comment, it was a "me" issue. Referencing here in case anyone runs into the same thing. Adding quotation marks the around the target ( 'markitdown[all]' ) allowed proper install.