STT icon indicating copy to clipboard operation
STT copied to clipboard

Feature request: support for linux/Aarch64 + gpu

Open tsmprog opened this issue 3 years ago • 5 comments

If you have a feature request, then please provide the following information:

Is your feature request related to a problem? Please describe. We have an Aarch64 device (nvidia jetson agx) and would like to use its GPU cores as running non CUDA version of coqui tends to be too slow for real time inference on our platform as well as being too CPU intensive.

Describe the solution you'd like Add linux/Aarch64 + gpu to supported platforms

Describe alternatives you've considered We have tried Tflite models on these devices, they run but but tend be of worse quality, and at least with deepspeech, ran slower than CUDA enabled full scale models.

With deepspeech this project https://github.com/domcross/DeepSpeech-for-Jetson-Nano/releases, managed to port 0.9.x to a jetson nano and agx. Showing at least with Deepspeech code base before your fork this is possible. We have also built it ourselves for an older version of Deepspeech but the build process was tricky and likely beyond the scope for many teams.

tsmprog avatar Jun 17 '21 16:06 tsmprog

We have also built it ourselves

Pull requests would be greatly appreciated :)

reuben avatar Jun 17 '21 16:06 reuben

Will see what I can do. In the project I am working in device time is quite limited at the mo and If I remember previous attempts to cross compile deepspeech did not end well. That was for version 0.5 I think , maybe things will be better now with all your changes :)

tsmprog avatar Jun 17 '21 17:06 tsmprog

Thank you! We have some cross compilation infra up and running for ARM already, maybe you can base it off of that. Some references:

https://github.com/coqui-ai/STT/blob/6f2c7a8a7baaa76f5f2c59fd956971deb4afed2c/.github/workflows/build-and-test.yml#L2183-L2253

https://github.com/coqui-ai/STT/blob/6f2c7a8a7baaa76f5f2c59fd956971deb4afed2c/ci_scripts/tf-vars.sh#L169-L170

Most of it is just flags to Bazel which the TF build system then handles on its own. But I'm sure CUDA complicates things. Let me know if there's any way I can help with getting the PR up!

reuben avatar Jun 17 '21 17:06 reuben

Thanks for the flags, much appreciated. I have a long list of tasks to do before I get to this but I hope to give it a proper bash in the next 6 weeks or so.

tsmprog avatar Jun 21 '21 07:06 tsmprog

Any updates on this? Was looking to use CUDA on a Jetson Nano but doesn't look like that's possible.

tazz4843 avatar Aug 13 '22 22:08 tazz4843