langchaingo icon indicating copy to clipboard operation
langchaingo copied to clipboard

llms: Add support for using the whisper model to transcribe audio

Open devalexandre opened this issue 1 year ago • 3 comments

PR Checklist

  • [x] Read the Contributing documentation.
  • [x] Read the Code of conduct documentation.
  • [x] Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as memory: add interfaces for X, Y or util: add whizzbang helpers).
  • [x] Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
  • [x] Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g. Fixes #123).
  • [x] Describes the source of new concepts.
  • [ ] References existing implementations as appropriate.
  • [x] Contains test coverage for new functions.
  • [x] Passes all golangci-lint checks.

devalexandre avatar Mar 20 '24 04:03 devalexandre

I agree with @eliben's intuition here, I'm not sure if audio transcription as a concept fits right into our llm namespace. I'm open to exposing this and generalizing over providers but I think it belongs in a different namespace.

tmc avatar Mar 26 '24 20:03 tmc

@tmc , @eliben

What would be the implementation idea for this functionality? maybe use openai.TranscribeAudio, leaving it only within the openai package and not in the LLM namespace?

I think in use how it, do a loader https://js.langchain.com/docs/integrations/document_loaders/file_loaders/openai_whisper_audio

devalexandre avatar Mar 27 '24 13:03 devalexandre

@tmc some update ?

devalexandre avatar Apr 23 '24 12:04 devalexandre