DeepSpeech Enable load_from_bytes of model and ExtScorer

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib
I've added a new argument (--init_from_bytes) in the client for testing the loading
I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)
I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

Sep 22 '20 12:09 bernardohenz

No Taskcluster jobs started for this pull request


The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

Sep 22 '20 12:09 community-tc-integration[bot]

For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib

this will be problematic for compatibility with some systems

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

I really don't see how this feature can work well with the tensorflow infra without forcing allocating huge strings

Sep 22 '20 14:09 lissyx

@bernardohenz I fail to get the usecase where this can be useful for mmap-able file formats (PBMM, TFLite), given how much this is going to force TensorFlow to allocate some memory.

Sep 22 '20 14:09 lissyx

@lissyx #3152 is one example

Sep 22 '20 14:09 reuben

@lissyx #3152 is one example

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

Sep 22 '20 14:09 lissyx

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

Also, when you introduce that, please try and expose it in as much bindings as you can.

Sep 22 '20 14:09 lissyx

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

Sep 22 '20 14:09 reuben

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

That's much more invasive and risky, except if we can upstream that, but we know how long it can take. Depending on @bernardohenz use-case, maybe it makes sense to just expose that for non protobuf mmap files ? or even only for TFLite models ?

Sep 23 '20 04:09 lissyx

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

1. For this implementation to work, I had to use `D_GLIBCXX_USE_CXX11_ABI=1` when compiling the lib

2. I've added a new argument (`--init_from_bytes`) in the client for testing the loading

3. I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with `DS_CreateModel` for instance)

4. I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

@bernardohenz I see you have code already for TFLite, can you describe what kind of help you need?

Sep 23 '20 04:09 lissyx

@lissyx I will be addressing the implementation of new methods for the API. The points I should need some help is the implementation of loading from buffer of pdmm and tflite models. If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification). Although I had implemented the code for tflite (just because I've found the tflite::FlatBufferModel::BuildFromBuffer), I haven't tested it yet, I have never compiled with tflite and I will try that in the next week.

I have no answer for using D_GLIBCXX_USE_CXX11_ABI=1 when compiling. When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

Sep 24 '20 12:09 bernardohenz

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

This does impact how C++ code handles strings, so maybe it depends on your env and in your case it makes sense? If you can share more of a stack, for example.

Sep 24 '20 12:09 lissyx

@bernardohenz can you:

check tflite code path on your side
force-push without D_GLIBCXX_USE_CXX11_ABI

I could take a look and run CI on that, so we can cross-check

Sep 24 '20 12:09 lissyx

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

Maybe this should be warned about in the docs: "using this feature can have negative impact on the amount of memory consumed by your application"

Sep 24 '20 13:09 lissyx

@bernardohenz Your PR needs rebasing, it seems.

Sep 28 '20 08:09 lissyx

@bernardohenz Please don't change file's mode from 644 to 755.

Sep 28 '20 10:09 lissyx

@bernardohenz You should now be able to run PRs yourself.

Sep 28 '20 15:09 lissyx

I've fixed the API functions exposition, tested the loading for TFLite.

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Loading a dummy .pb

Inputs	Time for running ./deepspeech Avg from 20 calls
model output_graph.pb	0.54
model output_graph.pb --init_from_bytes	0.55
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer	1.45
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer --init_from_bytes	4.56

Loading a .tflite

Inputs	Time for running ./deepspeech Avg from 20 calls
model deepspeech-0.8.2-models.tflite	1.72
model deepspeech-0.8.2-models.tflite --init_from_bytes	1.85
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer	2.50
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer --init_from_bytes	5.66

Sep 28 '20 16:09 bernardohenz

@bernardohenz There are still failures related to load_lm() call sites on https://community-tc.services.mozilla.com/tasks/groups/VhPPLNsKSzeyZDg5kxjxqQ

Sep 29 '20 08:09 lissyx

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Just for the sake of completeness, can you add figures for before your changes?

I've fixed the API functions exposition, tested the loading for TFLite.

Good, so now it's just about:

exposing through all our bindings,
gettings builds green,
and adding some test coverage,

Right?

Sep 29 '20 12:09 lissyx

@bernardohenz I see some merges, can you make sure this branch is clean of merges ?

Sep 30 '20 15:09 lissyx

@lissyx yep, I'm trying to squash the commits, sorry.

Sep 30 '20 15:09 bernardohenz

Just writing down some notes from my talk with Bernardo last week on the status of this PR:

We fixed out a segfault due to passing an address of a temporary
Bernardo is going to do add some timing calls in the scorer loading code to figure out if the slowdown is coming from the extra copy on the client.cc side (in which case it doesn't matter) or if the slowdown is inside libdeepspeech, in which case it needs to be addressed.
Then we need to make sure all the bindings get covered and there's test coverage as mentioned by @lissyx above

Oct 05 '20 10:10 reuben