markitdown optional dependency installation
Initial Error:
Traceback (most recent call last):
File "/Users/jiazhenghao/CodingProjects/research/SocSim/pdf2sim.py", line 6, in <module>
result = md.convert(paper_path)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 258, in convert
return self.convert_local(source, stream_info=stream_info, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 312, in convert_local
return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/anaconda3/envs/llm-sim/lib/python3.11/site-packages/markitdown/_markitdown.py", line 540, in _convert
raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
- PdfConverter threw MissingDependencyException with message: PdfConverter recognized the input as a potential .pdf file, but the dependencies needed to read .pdf files have not been installed. To resolve this error, include the optional dependency [pdf] or [all] when installing MarkItDown. For example:
* pip install markitdown[pdf]
* pip install markitdown[all]
* pip install markitdown[pdf, ...]
* etc.
Tried
pip install markitdown[all]
Produced Error:
zsh: no matches found: markitdown[all]
Fix:
pip install 'markitdown[all]'
I have the same issue (but for DocxConverter instead of PdfConverter).
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.12.9/x64/bin/markitdown", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/__main__.py", line 197, in main
result = markitdown.convert(
^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 260, in convert
return self.convert_local(source, stream_info=stream_info, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 314, in convert_local
return self._convert(file_stream=fh, stream_info_guesses=guesses, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages/markitdown/_markitdown.py", line 600, in _convert
raise FileConversionException(attempts=failed_attempts)
markitdown._exceptions.FileConversionException: File conversion failed after 1 attempts:
- DocxConverter threw MissingDependencyException with message: DocxConverter recognized the input as a potential .docx file, but the dependencies needed to read .docx files have not been installed. To resolve this error, include the optional dependency [docx] or [all] when installing MarkItDown. For example:
* pip install markitdown[docx]
* pip install markitdown[all]
* pip install markitdown[docx, ...]
* etc.
I use a Azure linux pipeline to run these two steps:
- bash: |
echo Installing MarkItDown...
pip install 'markitdown[all]'
displayName: 'Install MarkItDown'
- bash: |
echo Using MarkItDown to process markdown files...
markitdown ${{ parameters.pathToFile }} > "$(Build.ArtifactStagingDirectory)/${{ parameters.outputFile }}.md"
displayName: 'Run MarkItDown'
So the fix shown above doesn't seem to work. Anyone any thoughts on why?
hmmm, how do you folks typically install packages with optional dependencies?
Indeed, with zsh or fish, quoting is necessary and should be sufficient: pip install 'markitdown[all]'
You could also write a requirements.txt with:
markitdown[all]
And then do pip install -r requirements.txt
Thanks for the reply. Apparently the issue was in the type of quotes (which it quite often is in Azure Pipelines). pip install "markitdown[all]" worked instead of pip install 'markitdown[all]'