Samsul Rahmadani (munggok) issues

Results 10 issues of


                                            Samsul Rahmadani (munggok)

Normalize non-tesseract ocr bounding box

Hi @NielsRogge i'm trying to use external ocr ( paddleocr or googlevision) for processing with layoutlmv2 the [docs](https://huggingface.co/docs/transformers/model_doc/layoutlmv2) here state that you need to normalize each word's bounding box with...

[Feature request] Pipeline remove download file after process and extract single language

hi thank you for releasing tool! since cc dump very size/disk demanding can we have optional pipeline step like this: 1. immediately process(pipeline step) for each file in download command...

enhancement

Feature : Add support for VAD filter

Thank you for releasing the code since this implementation require less memory than other implementation adding VAD (Voice activity detection) should be more suitable Voice activity detection make whisper more...

Support for custom (converted) whisper.cpp format

hi..first of all... awesome apps AFAIK, right now only "official" whisper.cpp model supported in app. It would be really great if we can specify/load any custom whisper.cpp converted [format](https://github.com/ggerganov/whisper.cpp/tree/master/models#fine-tuned-models) this...

RuntimeError: Could not find response key token IDs when using bloom model and tokenizer to train

it work fine if i use gpt-j i guess this because of tokenizer and this https://github.com/databrickslabs/dolly/blob/03bf3852daa42e6091a39483dda0714c02de7382/training/trainer.py#L52 any tips to adjust it so it can use other model than gpt-j ?...

RAM crash when use collect method

first of all thanks for releasing the code i have dataset(mc4) size about 110 GB my machine specs is 96 cores cpu and 350 GB RAM i've successfully created 524GB...

some question to prepare multilinguality training from scratch

congrats to release v2 parler-tts @sanchit-gandhi @ylacombe or anyone involve i am trying to explore reproduce multilinguality training, some question to ask if i want to train it multilingual 1....

[Help]: Speed Up Emilia Pipeline

thanks for creating emilia pipeline i am using it now for my language, and so far so good is there a way to speeding it up? right now,1 hour audio...

information on 24khz model

hey @adelacvg thank for sharing the code after reading the code i want to ask you few question about new 24k model if you dont mind 1. what make different...

TTS training code

hello , i notice in script , you provide pretrain for speech encoder and video (CMIIW), but i cant seem to find code for training tts (vita/model/vita_tts) AFAIK, vita 1.5...