semantic-kernel icon indicating copy to clipboard operation
semantic-kernel copied to clipboard

.Net: Add AssemblyAI connector

Open Swimburger opened this issue 1 year ago • 4 comments

Motivation and Context

AssemblyAI is a speech AI company offering AI models through APIs. Adding a connector will help users integrate AssemblyAI easily with Semantic Kernel.

Description

Progress of implementation of AssemblyAI connector. Current implementation ASSEMBLYAI BRANCH

TODO

  1. AudioToTextService
  • [x] GetTextContentsAsync using AudioContent (#5094)
  • [x] GetTextContentsAsync using AudioStreamContent (#5094) (deprecated in favor of file service)
  • [x] Add DI extensions (#5094)
  • [ ] Add AssemblyAI file service to upload files (#5964)
  • [ ] Return typed class in TextContent.InnerContent
  • [ ] Add all transcript parameters to AssemblyAIAudioToTextExecutionSettings

Potential additions

  • Add real-time speech-to-text

Swimburger avatar Mar 08 '24 21:03 Swimburger

I noticed that the IAudioToTextService.GetTextContentsAsync method returns multiple TextContent's. We have APIs to return the transcript as sentences and another as paragraphs. Would it make sense to add options to AssemblyAIAudioToTextExecutionSettings, which would control whether the transcript is returned as a single TextContent, or a TextContent for each sentence, or a TextContent for each paragraph?

Swimburger avatar Mar 08 '24 21:03 Swimburger

I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get IAsyncEnumerable<StreamingTextContent>

Krzysztof318 avatar Mar 08 '24 22:03 Krzysztof318

I would add to todo also full realtime transcribing, so you send AudioContent or AudioStreamContent and you get IAsyncEnumerable<StreamingTextContent>

I want to add realtime, but I want to finalize and release non-realtime transcription first.

Our realtime solution uses a WebSocket connection, expects raw audio bytes to be sent continuously, and responds with partial and final transcript objects. This is mostly consistent with other realtime transcription services. I'd be happy to work with y'all in figuring out how to create a good abstraction that'll work for us and other realtime services.

Swimburger avatar Mar 08 '24 22:03 Swimburger

Instead of using the AudioStreamContent, I'm introducing an AssemblyAI file service for users to upload their files to AssemblyAI. #5964

In the future, we can use a streaming audio content class for Streaming STT.

Swimburger avatar May 01 '24 20:05 Swimburger

Now that we have the AssemblyAIAudioToTextService and AssemblyAIFileService in, I think we can release the initial version of this connector. What would the next steps be?

Swimburger avatar Jun 10 '24 13:06 Swimburger

This PR uses the AssemblyAI SDK: https://github.com/microsoft/semantic-kernel/pull/8556

Swimburger avatar Oct 16 '24 20:10 Swimburger