go-livepeer icon indicating copy to clipboard operation
go-livepeer copied to clipboard

(feat) configurable timestamp options for audio-to-text

Open eliteprox opened this issue 1 year ago • 1 comments

What does this pull request do? Explain your changes. (required)

This change adds the return_timestamps parameter to the audio-to-text pipeline, allowing end-users to configure the inference job to return timestamps at word-level, sentence-level or no timestamps at all.

Supported values for return_timestamps are false and word. The pipeline defaults to existing behavior of sentence-level timestamp transcription to avoid breaking changes with existing applications.

Specific updates (required)

  • This change only updates the go.mod references for ai-worker. See https://github.com/livepeer/ai-worker/pull/228

How did you test each of these updates (required)

sentence-level timestamps

  • Sent request without return_timestamps parameter to verify inference job still defaults to sentence-level timestamps sentence-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \

word-level timestamps

  • Sent request with return_timestamps=word to validate timestamps are returned at word-level word-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="word"

no timestamps

  • Sent request with return_timestamps=false to validate timestamps are excluded no-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="false"

Does this pull request close any open issues?

AI-630

Checklist:

  • [x] Read the contribution guide
  • [x] make runs successfully
  • [x] All tests in ./test.sh pass
  • [x] README and other documentation updated
  • [x] Pending changelog updated

eliteprox avatar Oct 15 '24 17:10 eliteprox