Niko Partanen

Results 20 comments of Niko Partanen

I work currently in [IKDP-2](https://langdoc.github.io/IKDP-2/) research project, where we are testing different tools with our Komi-Zyrian data. This could be very useful for us. I already did the training part...

Hi Oliver! Majority of the data is conversations and interviews between native speakers, although usually one speaker is the most dominant. Older recordings tend to have more monologues. There is...

> Oh wow, that's quite a lot of data! (by my standards) Thanks! We've had several native speakers transcribing for many years in the project, and especially including of archival...

Thanks for jumping in, @alexis-michaud! > These units are declared as such, i.e. 'əəə...' is not a sequence of three vowels but one object I see, this sounds like a...

I now finished training for more than hundred epochs. After 50 epochs PER sticked into around 0.70 and stayed there, and training LER kept going down all the way to...

Just reporting that I've been now testing this with different configurations. When I removed the word boundary marker and combined some of the primary interjections into individual labels, the best...

Hi Oliver! Thanks for help and feedback! This training was done with one speaker's utterances which make altogether 3 hours. The entire corpus is around 35 hours, but I wanted...

I'm testing the training now with new data. The training process is otherwise the same as described above, but I'm getting an error. I'm assuming some audio file has a...

I fixed it now, I think there must have been a very very short audio fragment there, so I set the threshold higher for that in my processing script. Now...

Thanks for explanation, @oadams! I have few more questions: When I create the corpus, i.e. like this: ``` from persephone.corpus import Corpus corpus = Corpus(feat_type="fbank", label_type="phonemes", tgt_dir="experiment") ``` It will...