swift-models icon indicating copy to clipboard operation
swift-models copied to clipboard

Speech to text example

Open saeta opened this issue 6 years ago • 7 comments

It'd be great to have an example speech to text model incorporated into our example models.

saeta avatar May 04 '19 16:05 saeta

@saeta Can I work on this and submit a PR?

manideep2510 avatar Feb 26 '20 07:02 manideep2510

@manideep2510 That would be great! As far as I know, no one else is working on such a representative model at this time. Thank you very much!

saeta avatar Feb 26 '20 07:02 saeta

Ok then, I will start working on it. I will try to implement Deep Speech 2. And I couldn't find an implementation for CTC loss in Swift. Is there one or should I implement it and push it to tensorflow/swift-apis?

manideep2510 avatar Feb 26 '20 08:02 manideep2510

Sounds great @manideep2510 ! I look forward to seeing what you're able to produce!

As for the CTC loss, I recommend first writing it next to your Deep Speech 2 implementation. This will allow you to iterate quickly without waiting for updated versions of swift-apis. Additionally, this will let you ensure it trains properly / converges. Once we get your Deep Speech 2 implementation merged into this repository (swift-models) we can then graduate it to swift-apis once it has a couple additional uses. (We do this for almost all API additions (even things like attention layers, which were first duplicated inside the GPT-2 and BERT implementations) because this allows us to ensure the new APIs are high quality (e.g. written to be usable & general) before incorporating into the common APIs.)

saeta avatar Feb 26 '20 15:02 saeta

Understood. Thanks.

manideep2510 avatar Feb 29 '20 13:02 manideep2510

@manideep2510 I think you can access CTC loss from _Raw enumeration. You can make it differentiable also in Swift.

RahulBhalley avatar Mar 03 '20 08:03 RahulBhalley

Thank you, I found the classes for CTC loss and decoder in the Enumerators.

manideep2510 avatar Mar 09 '20 03:03 manideep2510