markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

feat(magika): make magika an optional dependency

Open SyedaAnshrahGillani opened this issue 5 months ago • 2 comments

Description:

This pull request makes the magika library an optional dependency in the markitdown package. Previously, magika (and its dependency onnxruntime) was a mandatory installation, even for users who did not utilize its functionality.

By making magika optional, users can now choose to install it only if they require its features, leading to a smaller and more efficient installation for most users.

Changes:

  • packages/markitdown/pyproject.toml: Moved onnxruntime into a new optional dependency group magika.
  • packages/markitdown/src/markitdown/_markitdown.py:
    • Removed the direct import of magika.
    • Implemented a conditional import for magika within the MarkItDown class's init method.
    • Added a check in _get_stream_info_guesses to ensure magika is available before attempting to use it, gracefully skipping its functionality if not installed.

SyedaAnshrahGillani avatar Jul 17 '25 12:07 SyedaAnshrahGillani

@SyedaAnshrahGillani please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

SyedaAnshrahGillani avatar Jul 17 '25 12:07 SyedaAnshrahGillani

Of all PRs I'd like to see merged for this repo, this is the one I definitely would like to see merged the most.

josteink avatar Nov 21 '25 09:11 josteink

Could this PR get some love, pretty please? @afourney

Calling in a favour as a .NET member and otherwise MS champion in GNU Emacs 😄

josteink avatar Dec 15 '25 10:12 josteink