Oliver Adams

Results 78 comments of Oliver Adams

I don't have a strong feeling either way on this. I can see some motivation for using Utterance objects as representations inside the Corpus. Currently everything is just prefixes, but...

I agree. Let's go with option 2.

I agree it'd be good to think about something long term. I'm pretty open to where we host such things. You're a beacon of light when it comes to making...

Hi, did you face issues with [this approach?](https://persephone.readthedocs.io/en/latest/quickstart.html#saving-and-loading-models-transcribing-untranscribed-data)

Just for posterity: as per our offline discussion, for now we'll focus on robust support for ELAN files and come to this issue later.

Similarly, sox FAIL output should be caught and put it in the log. `preprocess.wav` backs off to pydub/ffmpeg in such cases, so it's not an issue unless those fail too.

I'm leaning towards segmenting on all unicode space characters. Pros of segmenting on other unicode space characters: - Users can't accidentally use a wrong space character, which would lead to...

Yeah, detecting voices and breaking on silence is definitely a good angle to take. However, for training data it doesn't fully solve the problem because we still need to know...