go-livepeer icon indicating copy to clipboard operation
go-livepeer copied to clipboard

(feat) configurable timestamp options for audio-to-text

Open eliteprox opened this issue 1 year ago • 1 comments

What does this pull request do? Explain your changes. (required)

This change adds the return_timestamps parameter to the audio-to-text pipeline, allowing end-users to configure the inference job to return timestamps at word-level, sentence-level or no timestamps at all.

Supported values for return_timestamps are false and word. The pipeline defaults to existing behavior of sentence-level timestamp transcription to avoid breaking changes with existing applications.

Specific updates (required)

  • This change only updates the go.mod references for ai-worker. See https://github.com/livepeer/ai-worker/pull/228

How did you test each of these updates (required)

sentence-level timestamps

  • Sent request without return_timestamps parameter to verify inference job still defaults to sentence-level timestamps sentence-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \

word-level timestamps

  • Sent request with return_timestamps=word to validate timestamps are returned at word-level word-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="word"

no timestamps

  • Sent request with return_timestamps=false to validate timestamps are excluded no-timestamps.json
curl -X POST "https://<GATEWAY_IP>/audio-to-text" \
    -F model_id=openai/whisper-large-v3 \
    -F audio=@<PATH_TO_FILE> \
    -F return_timestamps="false"

Does this pull request close any open issues?

AI-630

Checklist:

  • [x] Read the contribution guide
  • [x] make runs successfully
  • [x] All tests in ./test.sh pass
  • [x] README and other documentation updated
  • [x] Pending changelog updated

eliteprox avatar Oct 15 '24 17:10 eliteprox

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 35.92244%. Comparing base (c41f3c4) to head (9d5130c). Report is 8 commits behind head on ai-video.

Additional details and impacted files

Impacted file tree graph

@@                 Coverage Diff                 @@
##            ai-video       #3207         +/-   ##
===================================================
- Coverage   36.07820%   35.92244%   -0.15576%     
===================================================
  Files            124         124                 
  Lines          34525       34658        +133     
===================================================
- Hits           12456       12450          -6     
- Misses         21381       21520        +139     
  Partials         688         688                 

see 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1bc4a6a...9d5130c. Read the comment docs.

see 1 file with indirect coverage changes

codecov[bot] avatar Oct 23 '24 17:10 codecov[bot]