DeepSpeech Bad Encoding Output for Chinese Pretrained Model using JavaScript Bindings

Both versions are installed from the npm package. TensorFlow: v2.3.0-6-g23ad988 DeepSpeech: v0.9.3-0-gf2e9c85

Tried in both these environments:

Windows 10.0.19042 / Python 3.9.0
WSL2 Debian 10 / Python 3.7.3

I'll include my code here. I am going to work around this and not use the npm bindings. I haven't tried any other bindings. I just wanted to report here so that the team is aware. When I followed the Getting Started instructions on https://deepspeech.readthedocs.io/ it does in fact work on the command line using pip install. Just not with deepspeech npm package. Pretrained English model/scorer workers perfectly. Just not the pretrained chinese model provided on release page.

Thank you!

const DeepSpeech = require('deepspeech');
const MemoryStream = require('memory-stream');
const { readFileSync, writeFileSync } = require('fs');
const { Duplex } = require('stream');

const chineseModelName = 'deepspeech-0.9.3-models-zh-CN.pbmm';
const chineseScorerName = 'deepspeech-0.9.3-models-zh-CN.scorer';

const model = new DeepSpeech.Model(`models/${chineseModelName}`);

model.enableExternalScorer(`models/${chineseScorerName}`);

const buffer = readFileSync('test-data/chinese.wav');
const audioStream = new MemoryStream();

var stream = new Duplex();
stream.push(buffer);
stream.push(null);
stream.pipe(audioStream);

audioStream.on('finish', () => {
  let audioBuffer = audioStream.toBuffer();

  const metadata = model.sttWithMetadata(audioBuffer, 10);

  const output = metadata.transcripts
    .map((transcript) => {
      return transcript.tokens
        .map((token) => token.text)
        .join('')
    })
    .join('\n');

  writeFileSync('test-data/chinese.txt', output);
  console.log(output);

  DeepSpeech.FreeMetadata(metadata);
  DeepSpeech.FreeModel(model);
  process.exit(0);
});

Mar 24 '21 04:03 hunterwebapps

Thanks but you should be more explicit on the expected and actual output, or link your discourse thread ...

Mar 24 '21 07:03 lissyx

@hunterwebapps do you repro with https://community-tc.services.mozilla.com/api/queue/v1/task/aYXzKHZ9RGG-WyH5MeVdPg/runs/0/artifacts/public%2Fdeepspeech-0.10.0-alpha.3.tgz ?

Mar 24 '21 07:03 lissyx

Here's the discourse thread. https://discourse.mozilla.org/t/pretrained-chinese-model-invalid-inference-output/77439/5

It seems to be specific to the npm package. It just puts out bad encoding, rather than the expected output of valid encoding. Like �� instead of 我会说中文。But when I used the python command line tool it outputs as expected.

I can work without the javascript bindings, but I just wanted to report here so the team is aware, and anybody else who is looking (like I was) can at least find some info. Thanks!

Mar 24 '21 21:03 hunterwebapps

Here's the discourse thread. https://discourse.mozilla.org/t/pretrained-chinese-model-invalid-inference-output/77439/5

It seems to be specific to the npm package. It just puts out bad encoding, rather than the expected output of valid encoding. Like �� instead of 我会说中文。But when I used the python command line tool it outputs as expected.

I can work without the javascript bindings, but I just wanted to report here so the team is aware, and anybody else who is looking (like I was) can at least find some info. Thanks!

Right, so if you can give a try to the link above it might help us: this are current master bindings, and they are built with newer SWIG version, where they have (properly) fixed the NodeJS incompatibilities we had patches for on our SWIG fork.

So hopefully, the issue might have been on our patches. If that's the case, this newer npm package would fix.

Mar 24 '21 21:03 lissyx

Ah! I misunderstood. I just ran the updated version and am actually getting no output. It's telling me that my audio files are 0 seconds in length. I am still using the 0.9.3 models, and I tried both english (which was working with the corresponding release version) and chinese. Neither worked (both saying audio files are 0 sec long). I tried to look for updated models with the new alpha versions, but there don't appear to be any available. I just tried to modify the below url. No surprise there, but I figured I'd try.

https://github.com/mozilla/DeepSpeech/releases/download/v0.10.0-alpha.3/deepspeech-0.10.0-alpha.3-models-zh-CN.pbmm

Mar 24 '21 22:03 hunterwebapps

Neither worked (both saying audio files are 0 sec long).

Those NPM packages were green on CI, so I'd suspect something weird on your side, but I can't tell for sure.

Mar 25 '21 13:03 lissyx

@hunterwebapps As you can see in #3317 and on https://github.com/mozilla/DeepSpeech/projects/13 we are in the process of moving to GitHub Actions current status is that we have mostly end-to-end pipeline on macOS but it's not covering the mandarin work ; if you are interested it would be welcome to add test coverage there.

Getting feedback from people on the new GitHub Actions flow is also super important to us, so it would be a perfect case.

Mar 31 '21 15:03 lissyx