markitdown
markitdown copied to clipboard
refactor: split _markitdown.py into modular components
Description
This PR addresses the growing complexity of _markitdown.py by splitting it into smaller, more focused modules. The changes improve code organization and maintainability while preserving all existing functionality.
Changes
- Created a new
converters/package to house different converter implementations - Split converters into logical groups (document, web, media, text, archive)
- Moved core MarkItDown class functionality to
core.py - Separated exception classes into
exceptions.py - Updated imports and tests to reflect new structure
Testing
- All existing tests pass without modification
- Verified no functionality changes
Implementation Details
The refactoring follows these principles:
- Single Responsibility: Each module handles a specific type of conversion
- Open/Closed: New converters can be added without modifying existing code
- Interface Segregation: Clear base class and consistent converter interface
- Dependency Inversion: Core MarkItDown class depends on abstractions
Migration Notes
This is a non-breaking change as all public APIs remain unchanged. Internal imports are updated to reflect the new structure.
@microsoft-github-policy-service agree
Thanks for the work on this. It was included in a recent refactor for 0.1.0a1