transcribe
transcribe copied to clipboard
Add individual word timestamps
It looks like speech currently returns some timestamp information for what look like sections.
Would you be able to add timestamps for individual words?
Something like this: https://cloud.google.com/speech-to-text/docs/async-time-offsets#speech-async-recognize-gcs-python
I think the ideal way to do this would be to add a Block
class and a Word
class that both extend str
, then just have them have have a time property, so the block.time
would have the current time of the block (which would be equivalent to current functionality), while for word in block.words: word.time
would be able to have the time per word in the block.
Then modify speech.Speech.__iter__
to just return blocks instead of a tuple of time, string
and that block would just have a list of words accessibly via Block.words
that would return the individual Word
instances where you could get the time of that word.
I won't have time to add something like this for a bit, if you want to take a stab at it I'm happy to code review the pull requests and answer any questions
In the meantime, you can get around this by just making a custom script by doing something like this:
s = Speech(path_to_sound_file, lang='en-US')
google_response = s.scan()
for result in google_response.results:
for word_info in result.alternatives[0].words:
print(word_info, word_info.start_time, word_info.end_time)
This would give you access to the raw google response.
Sure thing - I'd love to take a stab at it.
Ultimately I want to search for keyword phrases in order to automate editing audio clips.
For example, I will record audio for a list of items, saying something like "Item 1 start", then talk about item 1, and when done, say "Item 1 end".
From there I'd use transcribe to get the timestamps. I could then use another program to create separate clips, one for each item.
...
Hopefully I'll have time to work on this soon. I'll definitely have questions.