yoyodyne icon indicating copy to clipboard operation
yoyodyne copied to clipboard

PyTorchification of transducer

Open kylebgorman opened this issue 2 years ago • 5 comments
trafficstars

[copied from CUNY-CL/abstractness/issues/123]

There are lot of pure Python loops in the transducer implementation and many can be replaced with PyTorch functions.

kylebgorman avatar Dec 09 '22 17:12 kylebgorman

PTL has a doc on "Finding bottlenecks in your code".

I wonder if this one in particular could be useful here: https://pytorch-lightning.readthedocs.io/en/stable/tuning/profiler_intermediate.html

Adamits avatar Jan 30 '23 16:01 Adamits

Thanks @Adamits , I used the profiler and the major bottleneck is the expert functions. Since this is gonna require a bit of a major overhaul, I'm gonna just leave this as a note to table until time permits a deep dive.

bonham79 avatar Feb 05 '23 05:02 bonham79

I was thinking about how edit distance could be a possible bottleneck and wondering how speech people do it, given the importance of WER in ASR codebases.

I found https://pytorch.org/audio/main/generated/torchaudio.functional.edit_distance.html -- could be useful for us?

EDIT: It looks like that code just loops in python so maybe not. Maybe we want a library in C or cython though, like https://pypi.org/project/editdistance/.

Adamits avatar Feb 23 '23 15:02 Adamits

Yeah short answer: write it in C++, thinking about cache locality, and wrap and expose to Python.

kylebgorman avatar Feb 23 '23 15:02 kylebgorman

Yeah, a nice little C++ module would make the edit distance calculation quicker. However, the main bottleneck is more the oracle itself: you have to continually update the position in the edit, which requires transferring the predicted edit action to cpu for each edit. This gpu -> cpu communication is just such a killer until most of the oracle operations can be made tensor operations.

bonham79 avatar Feb 24 '23 16:02 bonham79