Enable load_from_bytes of model and ExtScorer
This PR allows the loading of model and ExtScorer through array of bytes (string)
Some observations:
- For this implementation to work, I had to use
D_GLIBCXX_USE_CXX11_ABI=1when compiling the lib - I've added a new argument (
--init_from_bytes) in the client for testing the loading - I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with
DS_CreateModelfor instance) - I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.
No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.
For this implementation to work, I had to use
D_GLIBCXX_USE_CXX11_ABI=1when compiling the lib
this will be problematic for compatibility with some systems
I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with
DS_CreateModelfor instance)
Please introduce a new DS_CreateModelFromBuffer for example ?
I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.
I really don't see how this feature can work well with the tensorflow infra without forcing allocating huge strings
@bernardohenz I fail to get the usecase where this can be useful for mmap-able file formats (PBMM, TFLite), given how much this is going to force TensorFlow to allocate some memory.
@lissyx #3152 is one example
@lissyx #3152 is one example
Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115
I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with
DS_CreateModelfor instance)Please introduce a new
DS_CreateModelFromBufferfor example ?
Also, when you introduce that, please try and expose it in as much bindings as you can.
Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115
That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.
Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115
That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.
That's much more invasive and risky, except if we can upstream that, but we know how long it can take. Depending on @bernardohenz use-case, maybe it makes sense to just expose that for non protobuf mmap files ? or even only for TFLite models ?
This PR allows the loading of model and ExtScorer through array of bytes (
string)Some observations:
1. For this implementation to work, I had to use `D_GLIBCXX_USE_CXX11_ABI=1` when compiling the lib 2. I've added a new argument (`--init_from_bytes`) in the client for testing the loading 3. I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with `DS_CreateModel` for instance) 4. I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.
@bernardohenz I see you have code already for TFLite, can you describe what kind of help you need?
@lissyx I will be addressing the implementation of new methods for the API.
The points I should need some help is the implementation of loading from buffer of pdmm and tflite models. If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).
Although I had implemented the code for tflite (just because I've found the tflite::FlatBufferModel::BuildFromBuffer), I haven't tested it yet, I have never compiled with tflite and I will try that in the next week.
I have no answer for using D_GLIBCXX_USE_CXX11_ABI=1 when compiling. When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.
If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).
I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations
When set to 0, the libs ended up having segmentation fault. I still haven't figured out what
D_GLIBCXX_USE_CXX11_ABIdoes differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.
This does impact how C++ code handles strings, so maybe it depends on your env and in your case it makes sense? If you can share more of a stack, for example.
@bernardohenz can you:
- check tflite code path on your side
- force-push without
D_GLIBCXX_USE_CXX11_ABI
I could take a look and run CI on that, so we can cross-check
If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).
I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations
Maybe this should be warned about in the docs: "using this feature can have negative impact on the amount of memory consumed by your application"
@bernardohenz Your PR needs rebasing, it seems.
@bernardohenz Please don't change file's mode from 644 to 755.
@bernardohenz You should now be able to run PRs yourself.
I've fixed the API functions exposition, tested the loading for TFLite.
I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):
Loading a dummy .pb
| Inputs | Time for running ./deepspeech Avg from 20 calls |
|---|---|
| model output_graph.pb | 0.54 |
| model output_graph.pb --init_from_bytes | 0.55 |
| model output_graph.pb --scorer deepspeech-0.8.1-models.scorer | 1.45 |
| model output_graph.pb --scorer deepspeech-0.8.1-models.scorer --init_from_bytes | 4.56 |
Loading a .tflite
| Inputs | Time for running ./deepspeech Avg from 20 calls |
|---|---|
| model deepspeech-0.8.2-models.tflite | 1.72 |
| model deepspeech-0.8.2-models.tflite --init_from_bytes | 1.85 |
| model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer | 2.50 |
| model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer --init_from_bytes | 5.66 |
@bernardohenz There are still failures related to load_lm() call sites on https://community-tc.services.mozilla.com/tasks/groups/VhPPLNsKSzeyZDg5kxjxqQ
I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):
Just for the sake of completeness, can you add figures for before your changes?
I've fixed the API functions exposition, tested the loading for TFLite.
Good, so now it's just about:
- exposing through all our bindings,
- gettings builds green,
- and adding some test coverage,
Right?
@bernardohenz I see some merges, can you make sure this branch is clean of merges ?
@lissyx yep, I'm trying to squash the commits, sorry.
Just writing down some notes from my talk with Bernardo last week on the status of this PR:
- We fixed out a segfault due to passing an address of a temporary
- Bernardo is going to do add some timing calls in the scorer loading code to figure out if the slowdown is coming from the extra copy on the client.cc side (in which case it doesn't matter) or if the slowdown is inside libdeepspeech, in which case it needs to be addressed.
- Then we need to make sure all the bindings get covered and there's test coverage as mentioned by @lissyx above