markitdown icon indicating copy to clipboard operation
markitdown copied to clipboard

feat(converter): add video converter.

Open absadiki opened this issue 11 months ago • 4 comments

This PR is a step towards resolving #154.

It introduces a VideoConverter class that converts videos to markdown by:

  • Extracting metadata (if exiftool is installed)
  • Performing speech transcription (if speech_recognition and pydub are installed)
  • Generating a summary via a multimodal LLM from the transcription [This is optional and defaults to True if llm_client is configured]

Notes:

  • I believe checking the file type based on the extension is not ideal. There are many video extensions, and I think checking the mime_type would be a better approach, as it can cover a wider range of video files.
  • I’m unsure about the testing strategy .. should we focus only on testing exiftool? Please share your thoughts on this.
  • Additionally, I suggest refactoring Mp3Converter into a more general AudioConverter, as there are many audio extensions to consider. If you agree with this, I can submit a separate PR for it.

absadiki avatar Dec 21 '24 19:12 absadiki

could you add tests?

l-lumin avatar Dec 22 '24 07:12 l-lumin

@l-lumin, could you provide a sample video file that is allowed to be uploaded to the repo?

absadiki avatar Dec 22 '24 19:12 absadiki

@l-lumin, could you provide a sample video file that is allowed to be uploaded to the repo?

I think you can use the file you tested locally.If it's wrong, can change it later

l-lumin avatar Dec 23 '24 10:12 l-lumin

@l-lumin, okey I created a sample video file using ffmpeg. I've added test for exiftool for now. Maybe we can add tests for transcription as well, but #194 should be merged first.

absadiki avatar Dec 23 '24 19:12 absadiki