markitdown
markitdown copied to clipboard
[Enhancement] Override Converters Priority
Problem
Currently, you can assign priority to your plugin while development and assign a lower value to execute the plugin before built-in. The current version of the library doesn't support passing priority for converters manually.
Solution
The idea is to pass priority values manually when instantiating the MarkItDown class.
In my case, I am building a plugin with multiple converters, and I would like them to execute in different orders based on use case.
A simple change to the MarkItDown class can achieve this
Changes to class
class MarkItDown:
def __init__(
self,
*,
enable_builtins: Union[None, bool] = None,
enable_plugins: Union[None, bool] = None,
**kwargs,
):
self._builtins_enabled = False
self._plugins_enabled = False
# Store converter priorities from kwargs
self._converter_priorities = kwargs.get("converter_priorities", {})
Changes to register_converter method
def register_converter(
self,
converter: DocumentConverter,
*,
priority: float = PRIORITY_SPECIFIC_FILE_FORMAT,
) -> None:
# If priority is defined, then override for the converter
converter_type = type(converter).__name__
if converter_type in self._converter_priorities:
priority = self._converter_priorities[converter_type]
self._converters.insert(
0, ConverterRegistration(converter=converter, priority=priority)
)
Now, the priorities can be passed as an Argument to MarkItDown class allowing flexibility in execution order. If approved, please assign this issue to me—I can submit a PR to implement this enhancement. Thank you!