dlib Support for BLSTM and CTC

Hi! I've been using the library for a few days now, particularly the deep learning API and I'm very impressed by its ease of use, documentation and performance. However I couldn't find support for BLSTM and CTC, so I was wondering if that was planned for a future release.

Thank you and keep up with the good work :)

Jul 13 '16 08:07 arrufat

Thanks :)

You should be able to implement CTC loss by writing a new loss class that implements this interface. Adding something like BLSTM is more complicated since the tooling you need to make recurrent networks is missing. It's something I intend to add, and almost included in the first release, but I want to wait until the literature around RNNs stabilizes a little more before adding it to the API.

Jul 13 '16 10:07 davisking

Thank you for the quick reply, I'll have a look at the CTC in the meantime :)

Jul 14 '16 00:07 arrufat

I am interested in RNNs as well. I was looking at the EXAMPLE_COMPUTATIONAL_LAYER_ wondering how hard it would be to implement a LSTM layer. What kinds of difficulties do you anticipate, @davisking ? What kind of RNN specific tooling is missing in dlib?

Jul 19 '16 03:07 lvella

There needs to be a straightforward way for weight sharing and really also some kind of way to specify graphs with cycles rather than explicitly unrolling everything into a DAG.

Jul 19 '16 10:07 davisking

How about adding Generative Adversarial Networks from what I gathered from my yet rather limited knowledge they should be a lot easier as it basically two networks linked to each other. I think they might be more interesting than RNNs as it can be potentially used for unsupervised learning.

Nov 29 '16 10:11 OranjeeGeneral

FWIW, I found this issue on Google to see if anyone was working on a LSTM RNN for dlib.

I'm searching for a modern and effective speaker recognition solution and existing Open Source projects seem bogged down in older ML algorithms, long and brittle toolchains, poor documentation and an apparent lack of desire for packaging solutions.

A 'hoorah!' moment happened when I read this paper: https://arxiv.org/pdf/1705.02304.pdf

I'm a beginner in the field of DNNs but have been enjoying saturating my brain with papers and experimenting with both dlib and Keras. I was considering trying to implement the paper above in dlib because I already love the facial embedding paradigm: it does so much work for a consumer, but leaves the final step open to a wide variety of practical applications.

So that's how I arrived at this issue and is a long winded way to say +1 for LSTM RNNs in dlib! :+1:

Having the same embedding paradigm for voice would single handedly propel the entire speaker recognition field forward by one giant leap. :1st_place_medal:

If I were a bit more confident, I'd commit to contributing :disappointed:

Jul 24 '17 07:07 qacollective

dlib dlib copied to clipboard

Support for BLSTM and CTC

dlib
dlib copied to clipboard