feat(magika): make magika an optional dependency
Description:
This pull request makes the magika library an optional dependency in the markitdown package. Previously, magika (and its dependency onnxruntime) was a mandatory installation, even for users who did not utilize its functionality.
By making magika optional, users can now choose to install it only if they require its features, leading to a smaller and more efficient installation for most users.
Changes:
- packages/markitdown/pyproject.toml: Moved onnxruntime into a new optional dependency group magika.
- packages/markitdown/src/markitdown/_markitdown.py:
- Removed the direct import of magika.
- Implemented a conditional import for magika within the MarkItDown class's init method.
- Added a check in _get_stream_info_guesses to ensure magika is available before attempting to use it, gracefully skipping its functionality if not installed.
@SyedaAnshrahGillani please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]Options:
- (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
- (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"Contributor License Agreement
@microsoft-github-policy-service agree
Of all PRs I'd like to see merged for this repo, this is the one I definitely would like to see merged the most.
Could this PR get some love, pretty please? @afourney
Calling in a favour as a .NET member and otherwise MS champion in GNU Emacs 😄