gspeech-api
gspeech-api copied to clipboard
Node.js wrapper library to easily access Google Speech Recognition API
GSpeech API
A node.js wrapper library around the Google Speech API for automatic speech recognition.
Easy-to-use high-quality speech recognition
- Unlimited requests to Google automatic speech recognition servers
- Access to timed transcripts
Basic Usage
For clips under 60 seconds, usage is simple:
var gspeech = require('gspeech-api');
gspeech.recognize('path/to/my/file', function(err, data) {
if (err)
console.error(err);
console.log("Final transcript is:\n" + data.transcript);
});
Google servers ignore clips over 60 seconds, so for clips longer than that, you have to specify how you want your audio files split into pieces. To use default package settings, the same code from above for clips under 60 seconds works.
The speed varies, but in general, one hour of audio will take a couple minutes to process.
Installation
Gpseech-api relies on fluent-ffmpeg to deal with different audio formats, which has a dependency on ffmpeg.
Unfortunately, ffmpeg is a little bit tricky to install on Ubuntu 12.04 and 14.04. I followed the instructions to install from source from the ffmpeg Installation page.
All other dependencies should be automatically handled by npm.
npm install gspeech-api
Documentation
This package exposes one main method, gspeech.recognize(options, callback)
for taking a file and returning an array of timed captions along with a final transcript.
Arguments
Options
If options
is passed as a String
, it is taken as the path for file
. Otherwise, it should be an Object, which can have the following attributes:
-
file
- path to the audio file to be transcribed. -
segments
(optional) -[start]
Specifies how to divide the audio file into segments before transcription.start
is afloat
specifying a track time in seconds. If this argument is not specified, the audio is split into 60 second segments (the maximum length allowed by Google's servers). If a segment is longer than 60 seconds, it is split into 60 second segments. -
maxDuration
(optional) - any segments longer thanmaxDuration
will be split into segments ofmaxDuration
seconds. Defaults to 15 seconds. -
maxRetries
(optional) - sometimes Google servers do not respond correctly. If a segment is not processed correctly, it will be sent againmaxRetries
more times. Defaults to 1. -
limitConcurrent
(optional) - Google's servers communicate through separate GET and POST requests - sending too many requests at once may cause the two requests to not line up, resulting in errors. This defaults to 20.
Callback
Callback is a function which will be called after all requests to Google speech servers have completed. It is passed two parameters callback(err, data)
:
-
err
- contains an error message, if error occurs -
data
- contains an Object with the following fields, if all requests are successful:-
transcript
-String
: text of complete transcript -
timedTranscript
-[{'start': float, 'text': String}]
: unprocessed text of each segment transcribed separately by Google speech servers. Ifoptions.segments
is specified, each object corresponds to a segment; otherwise, each object designates how the audio file was split before being transcribed.
-
More Examples
Getting a timed transcript
gspeech.recognize('path/to/my/file', function(err, data) {
if (err)
console.error(err);
for (var i = 0; i < data.timedTranscript.length; i++) {
// Print the transcript
console.log(data.timedTranscript[i].start + ': '
+ data.timedTranscript[i].text + '\n');
}
});
Specifying times to split audio
If you would like to generate a timed transcript, and know where fragments start, specify these times to the library.
var segTimes = [0, 15, 20, 30];
gspeech.recognize({
'file': 'path/to/my/file',
'segments': segTimes,
},
function(err, data) {
if (err)
console.error(err);
for (var i = 0; i < data.timedTranscript.length; i++) {
console.log(data.timedTranscript[i].start + ': '
+ data.timedTranscript[i].text + '\n');
}
}
);
Disclaimer
This is not an officially supported Google API, and should only be used for personal purposes. The API is subject to change, and should not be relied upon by any crucial services. I'm also not actively maintaining this repo - for modifications, the best option is probably to make your own fork.