Sanchit Gandhi

Results 26 issues of Sanchit Gandhi

## ❓ Questions and Help #### What is your question? Many thanks for uploading the fine-tuned model checkpoints for Enhanced Direct Speech-to-Speech Translation in the recent PR https://github.com/facebookresearch/fairseq/pull/4588. Having downloaded...

question
needs triage

Adds section on (automatic) speech recognition.

First of all, thank you for your amazing work on the Whisper project and for open-sourcing the family of pre-trained checkpoints - these are of tremendous benefit to both the...

Original codebase: https://github.com/haoheliu/AudioLDM Checkpoints: https://huggingface.co/spaces/haoheliu/audioldm-text-to-audio-generation/tree/main/ckpt TODOs **UNet** - [x] Convert UNet weights - [x] Add new modelling code - [x] Verify correctness **VAE** - [x] Convert VAE weights - [x]...

### Feature request Wav2Vec2 is one of the most popular speech recognition models, used over 2 million times monthly. In the PyTorch modelling code, we have Wav2Vec2 for speech recognition...

TensorFlow
Good Second Issue
Feature request

### Feature request The PR https://github.com/huggingface/transformers/pull/21754 adds the PyTorch version of WhisperForAudioClassification. It would be great to add the Flax equivalent for cross-library equivalence ♻️ ### Motivation Whisper is an...

Good Second Issue
Feature request
Flax

### Feature request Firstly, thank you to @Narsil for developing a the speech recognition pipeline - it's incredibly helpful for running the full speech-to-text mapping in one call, pre and...

# What does this PR do? We should only filter training samples by our audio length criterion when fine-tuning ASR systems. The eval and test sets should **not** be filtered....

# What does this PR do? Can be used to fine-tune Flax Whisper for speech recognition. Tested and verified as working with the following (dummy) config: ``` run_flax_speech_recognition_seq2seq.py \ --model_name_or_path...

PR to make the Bark model a HF `PreTrainedModel`. The `PreTrainedModel` class takes care of all loading / saving logic, enabling checkpoints to be downloaded / pushed directly to the...