Converting Embedded image from Documents
Pull Request
Description
This PR introduces the following changes:
-
Initialization of New Attributes:
- Added
_mlm_clientand_mlm_modelattributes to thePptxConverterclass, initialized using thekwargsdictionary.
- Added
-
Handling of Image Shapes:
- Integrated a new method
_convert_image_to_markdownto handle the conversion of image shapes to markdown within the presentation slides processing loop.
- Integrated a new method
-
Handling of image within DataURI:
- Integrated a new validation to identify DataURIs of the image type and, if the LLM model has been defined, converts the image to markdown.
-
Addition of
_convert_image_to_markdownMethod:- Added a new method
_convert_image_to_markdownto thePptxConverterclass to convert image shapes to markdown format.
- Added a new method
Related Issue
Link to the related issue (if any).
Motivation and Context
- The new attributes
_mlm_clientand_mlm_modelare required for additional functionality. - The
_convert_image_to_markdownmethod improves the handling of image shapes by converting them to markdown format, enhancing the overall functionality of thePptxConverterclass. - The new feature that identifying and converting image-type DataURIs improves handling of documents (such as .docx) that have embedded images, enhancing the overall functionality of the
_CustomMarkdownifyclass and its dependents.
How Has This Been Tested?
- [ ] Unit tests
- [ ] Integration tests
- [ X ] Manual testing
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Screenshots (if appropriate):
Types of changes
- [ ] Bug fix
- [ X ] New feature
- [ ] Breaking change
- [ ] Documentation update
Checklist:
- [ X ] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [ X ] I have added tests to cover my changes.
- [ X ] All new and existing tests passed.
- [ X ] The title of my pull request is a short description of the requested changes.
Additional Notes
This new feature reflects over .pptx, .docx and .html (including extends classes)
please expand the pr description.
please expand the pr description.
Pull Request
Description
This PR introduces the following changes:
-
Initialization of New Attributes:
- Added
_mlm_clientand_mlm_modelattributes to thePptxConverterclass, initialized using thekwargsdictionary.
- Added
-
Handling of Image Shapes:
- Integrated a new method
_convert_image_to_markdownto handle the conversion of image shapes to markdown within the presentation slides processing loop.
- Integrated a new method
-
Addition of
_convert_image_to_markdownMethod:- Added a new method
_convert_image_to_markdownto thePptxConverterclass to convert image shapes to markdown format.
- Added a new method
Related Issue
Link to the related issue (if any).
Motivation and Context
- The new attributes
_mlm_clientand_mlm_modelare required for additional functionality. - The
_convert_image_to_markdownmethod improves the handling of image shapes by converting them to markdown format, enhancing the overall functionality of thePptxConverterclass.
How Has This Been Tested?
- [ ] Unit tests
- [ ] Integration tests
- [x] Manual testing
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Screenshots (if appropriate):
Types of changes
- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
- [ ] Documentation update
Checklist:
- [x] My code follows the code style of this project.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.
- [x] The title of my pull request is a short description of the requested changes.
Additional Notes
Add any additional information or context.
This is a mandate feature. When can this PR be merged?