langchaingo
langchaingo copied to clipboard
llms: Add support for using the whisper model to transcribe audio
PR Checklist
- [x] Read the Contributing documentation.
- [x] Read the Code of conduct documentation.
- [x] Name your Pull Request title clearly, concisely, and prefixed with the name of the primarily affected package you changed according to Good commit messages (such as
memory: add interfaces for X, Yorutil: add whizzbang helpers). - [x] Check that there isn't already a PR that solves the problem the same way to avoid creating a duplicate.
- [x] Provide a description in this PR that addresses what the PR is solving, or reference the issue that it solves (e.g.
Fixes #123). - [x] Describes the source of new concepts.
- [ ] References existing implementations as appropriate.
- [x] Contains test coverage for new functions.
- [x] Passes all
golangci-lintchecks.
I agree with @eliben's intuition here, I'm not sure if audio transcription as a concept fits right into our llm namespace. I'm open to exposing this and generalizing over providers but I think it belongs in a different namespace.
@tmc , @eliben
What would be the implementation idea for this functionality? maybe use openai.TranscribeAudio, leaving it only within the openai package and not in the LLM namespace?
I think in use how it, do a loader https://js.langchain.com/docs/integrations/document_loaders/file_loaders/openai_whisper_audio
@tmc some update ?