dlib
dlib copied to clipboard
Support for BLSTM and CTC
Hi! I've been using the library for a few days now, particularly the deep learning API and I'm very impressed by its ease of use, documentation and performance. However I couldn't find support for BLSTM and CTC, so I was wondering if that was planned for a future release.
Thank you and keep up with the good work :)
Thanks :)
You should be able to implement CTC loss by writing a new loss class that implements this interface. Adding something like BLSTM is more complicated since the tooling you need to make recurrent networks is missing. It's something I intend to add, and almost included in the first release, but I want to wait until the literature around RNNs stabilizes a little more before adding it to the API.
Thank you for the quick reply, I'll have a look at the CTC in the meantime :)
I am interested in RNNs as well. I was looking at the EXAMPLE_COMPUTATIONAL_LAYER_
wondering how hard it would be to implement a LSTM layer. What kinds of difficulties do you anticipate, @davisking ? What kind of RNN specific tooling is missing in dlib?
There needs to be a straightforward way for weight sharing and really also some kind of way to specify graphs with cycles rather than explicitly unrolling everything into a DAG.
How about adding Generative Adversarial Networks from what I gathered from my yet rather limited knowledge they should be a lot easier as it basically two networks linked to each other. I think they might be more interesting than RNNs as it can be potentially used for unsupervised learning.
FWIW, I found this issue on Google to see if anyone was working on a LSTM RNN for dlib.
I'm searching for a modern and effective speaker recognition solution and existing Open Source projects seem bogged down in older ML algorithms, long and brittle toolchains, poor documentation and an apparent lack of desire for packaging solutions.
A 'hoorah!' moment happened when I read this paper: https://arxiv.org/pdf/1705.02304.pdf
I'm a beginner in the field of DNNs but have been enjoying saturating my brain with papers and experimenting with both dlib and Keras. I was considering trying to implement the paper above in dlib because I already love the facial embedding paradigm: it does so much work for a consumer, but leaves the final step open to a wide variety of practical applications.
So that's how I arrived at this issue and is a long winded way to say +1 for LSTM RNNs in dlib! :+1:
Having the same embedding paradigm for voice would single handedly propel the entire speaker recognition field forward by one giant leap. :1st_place_medal:
If I were a bit more confident, I'd commit to contributing :disappointed: