php-docs-samples
php-docs-samples copied to clipboard
Could not find example for Speaker diarization?
Hi folks, i have hard time to get an data for multiple speakers. and there is no example for it. On google official docs there is no example u can see here https://cloud.google.com/speech-to-text/docs/multiple-voices.
use Google\Cloud\Speech\V1\SpeechClient;
use Google\Cloud\Speech\V1\RecognitionAudio;
use Google\Cloud\Speech\V1\RecognitionConfig;
use Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding;
/** Uncomment and populate these variables in your code */
// $audioFile = 'path to an audio file';
// change these variables if necessary
$encoding = AudioEncoding::LINEAR16;
$sampleRateHertz = 32000;
$languageCode = 'en-US';
if (!extension_loaded('grpc')) {
throw new \Exception('Install the grpc extension (pecl install grpc)');
}
// When true, time offsets for every word will be included in the response.
$enableWordTimeOffsets = true;
// get contents of a file into a string
$content = file_get_contents($audioFile);
// set string as audio content
$audio = (new RecognitionAudio())
->setContent($content);
$speakerDiarizationConfig = (new SpeakerDiarizationConfig()) //changes that i made for different speaker
->setEnableSpeakerDiarization(true) //changes that i made for different speaker
->setMinSpeakerCount(2) //changes that i made for different speaker
->setMaxSpeakerCount(6); //changes that i made for different speaker
// set config
$config = (new RecognitionConfig())
->setEncoding($encoding)
->setSampleRateHertz($sampleRateHertz)
->setLanguageCode($languageCode)
->setEnableWordTimeOffsets($enableWordTimeOffsets)
->setDiarizationConfig($speakerDiarizationConfig); //changes that i made for different speaker
// create the speech client
$client = new SpeechClient();
// create the asyncronous recognize operation
$operation = $client->longRunningRecognize($config, $audio);
$operation->pollUntilComplete();
if ($operation->operationSucceeded()) {
$response = $operation->getResult();
// each result is for a consecutive portion of the audio. iterate
// through them to get the transcripts for the entire audio file.
foreach ($response->getResults() as $result) {
$alternatives = $result->getAlternatives();
$mostLikely = $alternatives[0];
foreach ($mostLikely->getWords() as $wordInfo) {
$startTime = $wordInfo->getStartTime();
$endTime = $wordInfo->getEndTime();
printf(' Speaker %u Word: %s (start: %s, end: %s)' . PHP_EOL,
$wordInfo->getSpeakerTag() //changes that i made for different speaker
$wordInfo->getWord(),
$startTime->serializeToJsonString(),
$endTime->serializeToJsonString());
}
}
} else {
print_r($operation->getError());
}
$client->close();
Output: Speaker %u Word: %s (start: %s, end: %s) Speaker 0 this (start: "0s", end: "0.5s") Speaker 0 is (start: "0.5s", end: "1.5s") Speaker 0 an (start: "1.5s", end: "2.5s") Speaker 0 entire (start: "2s", end: "3.5s") Speaker 0 audio (start: "3.5s", end: "4.5s") Speaker 0 sentence (start: "4.5s", end: "5.5s") Speaker 0 that (start: "5.5s", end: "6.5s") Speaker 0 google (start: "6.5s", end: "7.5s") Speaker 0 give (start: "7.5s", end: "8.5s") Speaker 0 me (start: "8.5s", end: "9.5s") Speaker 0 in (start: "9.5s", end: "10.5s") Speaker 0 its (start: "10.5s", end: "11.5s") Speaker 0 response (start: "11.5s", end: "12.5s")
Speaker 1 this (start: "0s", end: "0.5s") Speaker 1 is (start: "0.5s", end: "1.5s") Speaker 1 an (start: "1.5s", end: "2.5s") Speaker 1 entire (start: "2s", end: "3.5s") Speaker 1 audio (start: "3.5s", end: "4.5s") Speaker 1 sentence (start: "4.5s", end: "5.5s") Speaker 1 that (start: "5.5s", end: "6.5s") Speaker 1 google (start: "6.5s", end: "7.5s") Speaker 1 give (start: "7.5s", end: "8.5s") Speaker 1 me (start: "8.5s", end: "9.5s") Speaker 1 in (start: "9.5s", end: "10.5s") Speaker 1 its (start: "10.5s", end: "11.5s") Speaker 1 response (start: "11.5s", end: "12.5s")
Speaker 3 this (start: "0s", end: "0.5s") Speaker 3 is (start: "0.5s", end: "1.5s") Speaker 3 an (start: "1.5s", end: "2.5s") Speaker 3 entire (start: "2s", end: "3.5s") Speaker 3 audio (start: "3.5s", end: "4.5s") Speaker 3 sentence (start: "4.5s", end: "5.5s") Speaker 3 that (start: "5.5s", end: "6.5s") Speaker 3 google (start: "6.5s", end: "7.5s") Speaker 3 give (start: "7.5s", end: "8.5s") Speaker 3 me (start: "8.5s", end: "9.5s") Speaker 3 in (start: "9.5s", end: "10.5s") Speaker 3 its (start: "10.5s", end: "11.5s") Speaker 3 response (start: "11.5s", end: "12.5s")
For the sake of simplicity i just cut of some response. first problem as u can see speakerTag value is wrong. the audio that i am sending in request having 5 speakers. it gives me 0,1 and then jump into 3. Now i dont know why google is not responding with 0,1,2,3, and 4 speakersTag. second problem google responding with entire audio text with single person and then with the other person as u can see in my output. I cant figure out is that a problem with my code or something else. i hope u got my problem.
Hi there! Yes, we'd love to see your code in PHP for separating different voices! Feel free to post your code snippets here, or to submit a pull request!