markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

markitdown optional dependency installation

Open Robert-Jia00129 opened this issue 9 months ago • 3 comments

Initial Error:

Traceback (most recent call last):
  File "/Users/jiazhenghao/CodingProjects/research/SocSim/pdf2sim.py", line 6, in <module>
    result = md.convert(paper_path)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 258, in convert
    return self.convert_local(source, stream_info=stream_info, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 312, in convert_local
    return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 540, in _convert
    raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
 - PdfConverter threw MissingDependencyException with message: PdfConverter recognized the input as a potential .pdf file, but the dependencies needed to read .pdf files have not been installed. To resolve this error, include the optional dependency [pdf] or [all] when installing MarkItDown. For example:

* pip install markitdown[pdf]
* pip install markitdown[all]
* pip install markitdown[pdf, ...]
* etc.

Tried pip install markitdown[all]

Produced Error:

zsh: no matches found: markitdown[all]

Fix: pip install 'markitdown[all]'

Robert-Jia00129 avatar Mar 25 '25 02:03 Robert-Jia00129

I have the same issue (but for DocxConverter instead of PdfConverter).

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.9/x64/bin/markitdown", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/__main__.py", line 197, in main
    result = markitdown.convert(
             ^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 260, in convert
    return self.convert_local(source, stream_info=stream_info, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 314, in convert_local
    return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 600, in _convert
    raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
 - DocxConverter threw MissingDependencyException with message: DocxConverter recognized the input as a potential .docx file, but the dependencies needed to read .docx files have not been installed. To resolve this error, include the optional dependency [docx] or [all] when installing MarkItDown. For example:

* pip install markitdown[docx]
* pip install markitdown[all]
* pip install markitdown[docx, ...]
* etc.

I use a Azure linux pipeline to run these two steps:

- bash: |
    echo Installing MarkItDown...
    pip install 'markitdown[all]'
  displayName: 'Install MarkItDown'

- bash: |
    echo Using MarkItDown to process markdown files...
    markitdown ${{ parameters.pathToFile }} > "$(Build.ArtifactStagingDirectory)/${{ parameters.outputFile }}.md"
  displayName: 'Run MarkItDown'

So the fix shown above doesn't seem to work. Anyone any thoughts on why?

Vloon avatar Mar 25 '25 12:03 Vloon

hmmm, how do you folks typically install packages with optional dependencies?

Indeed, with zsh or fish, quoting is necessary and should be sufficient: pip install 'markitdown[all]'

You could also write a requirements.txt with:

markitdown[all]

And then do pip install -r requirements.txt

afourney avatar Mar 25 '25 17:03 afourney

Thanks for the reply. Apparently the issue was in the type of quotes (which it quite often is in Azure Pipelines). pip install "markitdown[all]" worked instead of pip install 'markitdown[all]'

Vloon avatar Mar 31 '25 09:03 Vloon