transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Add Wav2Vec2BertProcessorWithLM

Open FredHaa opened this issue 9 months ago • 1 comments

Feature request

Wav2Vec2-Bert was open sourced and integrated with Transformers in the end of last year. However, it is missing an easy integration with pyctcdecode similar to Wav2Vec2ProcessorWithLM. This should be quite trivial to implement, since Wav2Vec2Processor is very similar to Wav2Vec2BertProcessor, the only difference being that they use different feature extractors.

Motivation

Having a Wav2Vec2BertProcessorWithLM class would make it possible to use Wav2Vec2-Bert with a kenlm model in a Transformers ASR pipeline.

Your contribution

I can submit a PR.

FredHaa avatar May 06 '24 10:05 FredHaa

cc @sanchit-gandhi @ylacombe

LysandreJik avatar May 06 '24 12:05 LysandreJik

Hey @FredHaa, #28706 should fix this, I'm reopening it! Note that you would have to use Wav2Vec2ProcessorWithLM and not Wav2Vec2BertProcessorWithLM!

ylacombe avatar May 20 '24 09:05 ylacombe

#28706 has been merged, I'm closing the issue for now, feel free to ask questions

ylacombe avatar May 20 '24 11:05 ylacombe