DeepSpeech Memory Leak using Java Native Client

trafficstars

For support and discussions, please use our Discourse forums.

If you've found a bug, or have a feature request, then please create an issue with the following information:

Have I written custom code (as opposed to running examples on an unmodified clone of the repository):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Android R (API 30)
Exact command to reproduce:

I modified the STT-example in Kotlin for use in a Mozilla-STT React Native module I'm building. However, running streaming transcription uses more and more memory until the app crashes. See a fully working example by cloning this repo, running yarn bootstrap, then yarn example android. Attach to it using the Android Studio profiler to watch the memory usage creep upwards.

Implementation: https://github.com/zaptrem/react-native-transcription/blob/master/android/src/main/java/com/reactnativetranscription/TranscriptionModule.kt

Please describe the problem clearly. Be sure to convey here why it's a bug or a feature request.

Include any logs or source code that would be helpful to diagnose the problem. For larger logs, link to a Gist, not a screenshot. If including tracebacks, please include the full traceback. Try to provide a reproducible test case.

Aug 24 '20 01:08 zaptrem

However, running streaming transcription uses more and more memory until the app crashes.

I know nothing about Kotlin and its memory model, can you ensure this is 100% NOT related to your usage of the library?

Aug 24 '20 08:08 lissyx

However, running streaming transcription uses more and more memory until the app crashes.

I know nothing about Kotlin and its memory model, can you ensure this is 100% NOT related to your usage of the library? @lissyx

I, too, know nothing about Kotlin and its memory model apart from what I learned building this module, which I started only a few days ago. However, my usage is nearly line-for-line identical to the STT-examples Android demo. The only exception is my packaging of the Metadata object and Native Event dispatching. However, the leak remains even when those are commented out.

Aug 24 '20 08:08 zaptrem

However, running streaming transcription uses more and more memory until the app crashes.

I know nothing about Kotlin and its memory model, can you ensure this is 100% NOT related to your usage of the library? @lissyx

I, too, know nothing about Kotlin and its memory model apart from what I learned building this module, which I started only a few days ago. However, my usage is nearly line-for-line identical to the STT-examples Android demo. The only exception is my packaging of the Metadata object and Native Event dispatching. However, the leak remains even when those are commented out.

My point being: I have no idea whether the STT-examples code is leak-free, and in the current state of events, I can't focus time on investigate that on JNI/Java side.

Aug 24 '20 08:08 lissyx

@lissyx Are there other contributors with experience in this area who might be good to cc?

Aug 24 '20 08:08 zaptrem

The kotlin example was contributed by @erksch

Aug 24 '20 08:08 lissyx

@lissyx @erksch the Android Studio debugger reports that the memory leak is in the "Native Code" section. Doesn't this indicate that it's a library issue? I'm running multi-hour recognition jobs, so maybe the issue doesn't show up in your shorter test cases?

Aug 26 '20 01:08 zaptrem

@lissyx @erksch the Android Studio debugger reports that the memory leak is in the "Native Code" section. Doesn't this indicate that it's a library issue? I'm running multi-hour recognition jobs, so maybe the issue doesn't show up in your shorter test cases?

Again, I know nothing about that tooling. If it's the case, then can you write C code reproducing ? How can we distinguish "native" from our library from swig wrapper?

Aug 26 '20 08:08 lissyx

@lissyx Doesn't the SWIG count as part of DeepSpeech's repo/code? The RN module I was building is basically ready for its first release, but I don't want to move forward until we can resolve this issue. I have an example above that's extremely easy to reproduce (one command and two clicks in the emulator). Is writing a whole C interface (while closely studying the SWIG so that they match) with no prior experience absolutely necessary to move forward on fixing this bug?

Also of importance, commenting out model.feedAudioContent() in the AudioRecorder loop seems to solve the memory leak.

@erksch (and lissyx) I can confirm that the exact same issue is present on the android-streaming-demo in STT-examples. This is more evidence that the problem isn't just on my end, right?

Aug 30 '20 05:08 zaptrem

You seem not to understand that i really have no time right now to track a memory leak from the top down.

Aug 30 '20 08:08 lissyx

have an example above that's extremely easy to reproduce (one command and two clicks in the emulator). Is writing a whole C interface (while closely studying the SWIG so that they match) with no prior experience absolutely necessary to move forward on fixing this bug?

Share and we'll see.

Aug 30 '20 08:08 lissyx

have an example above that's extremely easy to reproduce (one command and two clicks in the emulator). Is writing a whole C interface (while closely studying the SWIG so that they match) with no prior experience absolutely necessary to move forward on fixing this bug?

Share and we'll see.

See OP:

See a fully working example by cloning this repo, running yarn bootstrap, then yarn example android. Attach to it using the Android Studio profiler to watch the memory usage creep upwards.

The STT-examples implementation might be easier to follow, though.

https://github.com/mozilla/DeepSpeech-examples/tree/r0.8/android_mic_streaming

If nobody can fix it now could it be marked as a bug so it doesn’t get forgotten about later?

Aug 30 '20 08:08 zaptrem

See a fully working example by cloning this repo, running yarn bootstrap, then yarn example android. Attach to it using the Android Studio profiler to watch the memory usage creep upwards.

The STT-examples implementation might be easier to follow, though.

Sorry but this is way too much.

The bug could be:

in the c code it self
in the c generated by swig
in the Java libdeepspeech part
in the Java using it

Please try and repro to reduce the search space.

Aug 30 '20 08:08 lissyx

Also, please understand we have had a very stressful month and it's the week-end, i might want to just rest for a few hours more ?

Aug 30 '20 08:08 lissyx

If nobody can fix it now could it be marked as a bug so it doesn’t get forgotten about later?

It needs to be investigated to know if the bug is really on our side...

Aug 30 '20 08:08 lissyx

@erksch (and lissyx) I can confirm that the exact same issue is present on the android-streaming-demo in STT-examples. This is more evidence that the problem isn't just on my end, right?

Again, i have no idea, you are both using kotlin and i have not thoroughly verified your logic..

Aug 30 '20 08:08 lissyx

Also, please understand we have had a very stressful month and it's the week-end, i might want to just rest for a few hours more ?

I totally understand. It’s 5am where I am and I’m just trying to get a few tasks (this was the last) checked off before I can sleep. It’s an issue forum and not text messages, so I don’t expect instant responses. I just want to make the information I have available to all of you as it comes to me.

Not much more digging I can do on this front for now as there’s a big launch on Tuesday coming up.

Aug 30 '20 08:08 zaptrem

Not much more digging I can do on this front for now as there’s a big launch on Tuesday coming up

Thanks for your understanding. Please note if you can find to repro in plain C on linux/amd64 it can be much quickly investigated.

I also hope we can add debug build to our CI now that we have had some work on it, that would help on such case.

Aug 30 '20 09:08 lissyx

@zaptrem Hey sorry for being absent maybe I can look at it in the next time. But if its deep down in DeepSpeech I will not be able to help either, but I could at least reproduce it and make me my own picture.

But I guess a memory leak could easily be in the android example code because maybe some buffers are not cleaned up properly. But that only commenting out feedAudioContent solves the memory leak makes me worry that the problem is further down.

Aug 30 '20 09:08 erksch

But that only commenting out feedAudioContent solves the memory leak makes me worry that the problem is further down.

Ftr we have tracked down other memory leak recently and this code path never showed, so it'd be great if we can confirm it is on Java, swig or c side.

Aug 30 '20 10:08 lissyx

As I feared, lots of things are breaking on TensorFlow side when trying to build a debug version on CI and on setups where it is more complicated to investigate (Windows CPU and CUDA debug builds) ...

Aug 31 '20 09:08 lissyx

@zaptrem even getting android debug build requires non trivial amount of work, there are several tensorflow r2.3 debug build specific breakages reported upstream, some not yet with a fix, some only fixed on master. This, added to the complex stack up to kotlin makes me unable to quickly investigate: we really need your help here into trying and reproduce the bogus behavior with pure linux/amd64 builds.

Sep 02 '20 12:09 lissyx

@zaptrem @erksch Have you been able to get more informations on that matter ?

Sep 08 '20 15:09 lissyx

@zaptrem ping? Have you been able to qualify better the issue? I have been able to have a look, comparing non stream vs stream uses via the C++ client,

(this is on current master, c7d58d628e6f701ce3575a44ef9282730ec02ab3)

Non Stream:

==1401849== LEAK SUMMARY:
==1401849==    definitely lost: 0 bytes in 0 blocks
==1401849==    indirectly lost: 0 bytes in 0 blocks
==1401849==      possibly lost: 0 bytes in 0 blocks
==1401849==    still reachable: 8,260 bytes in 103 blocks
==1401849==         suppressed: 0 bytes in 0 blocks

The still reachable are all from libsox according to what I could analyze.

Stream:

==2125011== LEAK SUMMARY:
==2125011==    definitely lost: 20 bytes in 3 blocks
==2125011==    indirectly lost: 0 bytes in 0 blocks
==2125011==      possibly lost: 0 bytes in 0 blocks
==2125011==    still reachable: 8,260 bytes in 103 blocks
==2125011==         suppressed: 0 bytes in 0 blocks

And it would come from the intermediate decode:

 240 ==2125011== 20 bytes in 3 blocks are definitely lost in loss record 17 of 104                                                                                                                                                                                                                                                                                            241 ==2125011==    at 0x483877F: malloc (vg_replace_malloc.c:307)                                                                                                                                                                                                                                                                                                            242 ==2125011==    by 0x5E93F0A: strdup (strdup.c:42)                                                                                                                                                                                                                                                                                                                        243 ==2125011==    by 0x4A12003: ModelState::decode(DecoderState const&) const (modelstate.cc:36)                                                                                                                                                                                                                                                                            244 ==2125011==    by 0x4A067AD: StreamingState::intermediateDecode() const (deepspeech.cc:137)                                                                                                                                                                                                                                                                             
 245 ==2125011==    by 0x4A0776B: DS_IntermediateDecode (deepspeech.cc:458)                                                                                                                                                                                                                                                                                                   246 ==2125011==    by 0x11ADBF: LocalDsSTT(ModelState*, short const*, unsigned long, bool, bool) (in /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/DeepSpeech-lissyx/native_client/deepspeech)                                                                                                                                                                          247 ==2125011==    by 0x11B3A7: ProcessFile(ModelState*, char const*, bool) (in /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/DeepSpeech-lissyx/native_client/deepspeech)                                                                                                                                                                                               248 ==2125011==    by 0x11BB05: main (in /home/alexandre/Documents/codaz/Mozilla/DeepSpeech/DeepSpeech-lissyx/native_client/deepspeech)

The definitely lost seems to be a hint but the leak is low (but only one file analyzed).

Does this matches your experience? Do you have higher volume of leaking?

Sep 29 '20 11:09 lissyx

One can verify the JNI part:

$ less native_client/java/jni/deepspeech_wrap.cpp
[...]
SWIGEXPORT jstring JNICALL Java_org_mozilla_deepspeech_libdeepspeech_implJNI_IntermediateDecode(JNIEnv *jenv, jclass jcls, jlong jarg1) {
  jstring jresult = 0 ;
  StreamingState *arg1 = (StreamingState *) 0 ;
  char *result = 0 ;
  
  (void)jenv;
  (void)jcls;
  arg1 = *(StreamingState **)&jarg1; 
  result = (char *)DS_IntermediateDecode((StreamingState const *)arg1);
  if (result) jresult = jenv->NewStringUTF((const char *)result);
  DS_FreeString(result);
  return jresult;
}
[...]

So there is an explicit call to DS_FreeString

Sep 29 '20 12:09 lissyx

The definitely lost seems to be a hint but the leak is low (but only one file analyzed).

And I can confirm this is because of the implementation in client.cc, JNI part seems to be okay.

Sep 29 '20 12:09 lissyx

That’s great news, right?

On Tue, Sep 29, 2020 at 8:42 AM lissyx [email protected] wrote:

The definitely lost seems to be a hint but the leak is low (but only one file analyzed).

And I can confirm this is because of the implementation in client.cc, JNI part seems to be okay.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mozilla/DeepSpeech/issues/3271#issuecomment-700676669, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMJTRRSOEBSXOLNQ2SMEG3SIHI4HANCNFSM4QI7OFOQ .

Sep 29 '20 13:09 zaptrem

That’s great news, right? … On Tue, Sep 29, 2020 at 8:42 AM lissyx @.***> wrote: The definitely lost seems to be a hint but the leak is low (but only one file analyzed). And I can confirm this is because of the implementation in client.cc, JNI part seems to be okay. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3271 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMJTRRSOEBSXOLNQ2SMEG3SIHI4HANCNFSM4QI7OFOQ .

That depends, if you confirm or not my analysis ...

Sep 29 '20 13:09 lissyx

Unfortunately:

tensorflow does not seems to care much about debug builds, given how -c dbg fails mostly everywhere on CI
tensorflow or bazel does not respect --strip=never nor anything
so there's mostly no way to produce debug builds / symbols on CI (i've already burnt a few days on that, with no success)
there's also no way to separately generate debug symbols, so it's mostly useless to run valgrind on our CI

Sep 30 '20 10:09 lissyx

I might have found a middle-ground to provide debug builds + valgrind, with the following limitations:

valgrind only on linux/amd64
debug builds on all linux flavors, as much as I can hope, but still to be confirmed
this should also cover android
this might work on macOS and Windows, but I will not spend too much time on that
CUDA might be too painful for that (libdeepspeech.so for CUDA > 2GB, that is triggering infra issues)
you would have to use those libdeepspeech.so debug build by swapping the libdeepspeech.so inside packages, building full blown two flavors efficiently would require too much CI work

Sep 30 '20 19:09 lissyx

@zaptrem I know nothing about kotlin nor your code, but there is something I don't get there:

    model?.let { model ->
      streamContext?.let { streamContext ->
        val shortArray = ShortArray(audioRecordData.size / 2) {
          (audioRecordData[it * 2].toUByte().toInt() + (audioRecordData[(it * 2) + 1].toInt() shl 8)).toShort()
        }
        model.feedAudioContent(streamContext, shortArray, shortArray.size)
        val decoded = model.intermediateDecodeWithMetadata(streamContext, 1)
        val decodedString = model.intermediateDecode(streamContext)

        if(decodedString != lastTranscription){
          lastTranscription = decodedString
          val map = packageTranscription(decoded)
          emitDeviceEvent("onRecordingChange", map)
          Log.d("transcription", decodedString)
        }
      }
    }

Aside, it took me half an hour navigating in your code to find where the calls where. We would have saved everybody a lot of time if you had directly linked to the "streaming transcription uses" you referred to instead of just linking to the repo.

First, why do you call both intermediateDecodeWithMetadata and intermediateDecode ?

Second, I don't see any finalize() nor delete() call on the Metadata object, which is how the bindings will call DS_FreeMetadata() and that is, as much as we know, the canonical way to handle that in Java. I have no idea if it is up to you or the VM's job to call delete().

$ cat native_client/java/libdeepspeech/src/main/java/org/mozilla/deepspeech/libdeepspeech/Metadata.java 
/* ----------------------------------------------------------------------------
 * This file was automatically generated by SWIG (http://www.swig.org).
 * Version 4.0.2
 *
 * Do not make changes to this file unless you know what you are doing--modify
 * the SWIG interface file instead.
 * ----------------------------------------------------------------------------- */

package org.mozilla.deepspeech.libdeepspeech;

public class Metadata {
  private transient long swigCPtr;
  protected transient boolean swigCMemOwn;

  protected Metadata(long cPtr, boolean cMemoryOwn) {
    swigCMemOwn = cMemoryOwn;
    swigCPtr = cPtr;
  }

  protected static long getCPtr(Metadata obj) {
    return (obj == null) ? 0 : obj.swigCPtr;
  }

  @SuppressWarnings("deprecation")
  protected void finalize() {
    delete();
  }

  public synchronized void delete() {
    if (swigCPtr != 0) {
      if (swigCMemOwn) {
        swigCMemOwn = false;
        implJNI.delete_Metadata(swigCPtr);
      }
      swigCPtr = 0;
    }
  }

  public long getNumTranscripts() {
    return implJNI.Metadata_NumTranscripts_get(swigCPtr, this);
  }

  public CandidateTranscript getTranscript(int i) {
    return new CandidateTranscript(implJNI.Metadata_getTranscript(swigCPtr, this, i), false);
  }

}

So if DS_FreeMetadata() is never called, it would not be surprising you observe a leak, but it is obviously not a bug on our side if that is the case.

Sep 30 '20 19:09 lissyx

DeepSpeech DeepSpeech copied to clipboard

Memory Leak using Java Native Client

DeepSpeech
DeepSpeech copied to clipboard