markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

[Enhancement] Override Converters Priority

Open AjjayK opened this issue 7 months ago • 0 comments

Problem

Currently, you can assign priority to your plugin while development and assign a lower value to execute the plugin before built-in. The current version of the library doesn't support passing priority for converters manually.

Solution

The idea is to pass priority values manually when instantiating the MarkItDown class.

In my case, I am building a plugin with multiple converters, and I would like them to execute in different orders based on use case.

A simple change to the MarkItDown class can achieve this

Changes to class

class MarkItDown:
    def __init__(
        self,
        *,
        enable_builtins: Union[None, bool] = None,
        enable_plugins: Union[None, bool] = None,
        **kwargs,
    ):
        self._builtins_enabled = False
        self._plugins_enabled = False
        
        # Store converter priorities from kwargs
        self._converter_priorities = kwargs.get("converter_priorities", {})

Changes to register_converter method

    def register_converter(
        self,
        converter: DocumentConverter,
        *,
        priority: float = PRIORITY_SPECIFIC_FILE_FORMAT,
    ) -> None:

        # If priority is defined, then override for the converter
        converter_type = type(converter).__name__
        if converter_type in self._converter_priorities:
            priority = self._converter_priorities[converter_type]
            
        self._converters.insert(
            0, ConverterRegistration(converter=converter, priority=priority)
        )

Now, the priorities can be passed as an Argument to MarkItDown class allowing flexibility in execution order. If approved, please assign this issue to me—I can submit a PR to implement this enhancement. Thank you!

AjjayK avatar May 11 '25 04:05 AjjayK