audio icon indicating copy to clipboard operation
audio copied to clipboard

Add save functionality to CTCDecoder

Open FredHaa opened this issue 6 months ago • 0 comments

🚀 The feature

Add a CTCDecoder.save_to_dir(save_dir: str | Path) function , which saves the lexicon, tokens, kenlm file, decoder_options, and anything else required to build the decoder to a directory.

Saving the kenlm file either requires support in flashlight-text or passing the path to the CTCDecoder init instead of the KenLM object, so the file can be copied to the save_dir.

Motivation, pitch

HF transformers is looking at changing its dependency on pyctcdecode to the torchaudio CTCDecoder (huggingface/transformers/issues/41230).

In order to support pushing the decoder to the hub, it needs to support something equivalent to pyctcdecode.BeamSearchDecoderCTC.save_to_dir.

I'll be happy to make a PR.

Alternatives

No response

Additional context

No response

FredHaa avatar Oct 02 '25 12:10 FredHaa