bozden comments

Results 108 comments of


                                            bozden

Adding latvian sentence cleaners

Good idea @raivisdejus. I think you are trying to correct this: > The "?" inside words were caused by an encoding issue during import from old sentence collector, unicode characters...

Is there a limit to the audio duration?

Hey @JJun-Guo, recordings in Common Voice are currently limited to 10 seconds. Here is a related recent discussion on allowing more: https://discourse.mozilla.org/t/discussion-relaxation-of-the-10-sec-recording-limitation/114142

Is there a limit to the audio duration?

I need to check it from the code, but from my head, it was 1 sec but dropped to 0.5... Actually, as it also includes silences, short uttrences can easily...

Is there a limit to the audio duration?

I was wrong. It is 1 sec. 0.5 sec is for the benchmark sentences (numbers etc). https://github.com/common-voice/common-voice/blob/3bccdf446f6acd8a9afda1db7a9a1664457e611d/web/src/components/pages/contribution/speak/speak.tsx#L42 But as I stated on the link given in the previous post, state-of-the...

Is there a limit to the audio duration?

AFAIK, a rule-of-thumb is to train a model with data which it will see in the wild. For a general purpose ASR model where the model is subjected to everyday...

Is there a limit to the audio duration?

If you are working on the cv-sentence-extractor rules (first run): Getting longer sentences are better I think. It is easier to get shorter sentences from other sources. Once it gets...

Is there a limit to the audio duration?

@MichaelKohler, can this be made adaptive? I mean, not to put an absolute minimum, but set a "requested_minimum", if the 3 sentences are not found, fill it with shorter ones...

Is there a limit to the audio duration?

As you know working on this was on my to-do list, if only I can get really good results... I'll look into this. E.g sorting sentences by length can help...

Is there a limit to the audio duration?

Very good point... But this is how it works now, isn't it? So, as of now, if an article has 3 sentences, they are taken if the rules match. One...

Is there a limit to the audio duration?

As I mentioned above, with the state-of-the-art models and HW advancements, it is better to get longer audio, thus longer texts. A change in this repo towards this goal would...