Niko Partanen comments

Results 20 comments of


                                            Niko Partanen

Model saving, loading, transcribing unseen data

I work currently in [IKDP-2](https://langdoc.github.io/IKDP-2/) research project, where we are testing different tools with our Komi-Zyrian data. This could be very useful for us. I already did the training part...

Model saving, loading, transcribing unseen data

Hi Oliver! Majority of the data is conversations and interviews between native speakers, although usually one speaker is the most dominant. Older recordings tend to have more monologues. There is...

Model saving, loading, transcribing unseen data

> Oh wow, that's quite a lot of data! (by my standards) Thanks! We've had several native speakers transcribing for many years in the project, and especially including of archival...

Model saving, loading, transcribing unseen data

Thanks for jumping in, @alexis-michaud! > These units are declared as such, i.e. 'əəə...' is not a sequence of three vowels but one object I see, this sounds like a...

Model saving, loading, transcribing unseen data

I now finished training for more than hundred epochs. After 50 epochs PER sticked into around 0.70 and stayed there, and training LER kept going down all the way to...

Model saving, loading, transcribing unseen data

Just reporting that I've been now testing this with different configurations. When I removed the word boundary marker and combined some of the primary interjections into individual labels, the best...

Suggestions wanted with training parameters

Hi Oliver! Thanks for help and feedback! This training was done with one speaker's utterances which make altogether 3 hours. The entire corpus is around 35 hours, but I wanted...

Suggestions wanted with training parameters

I'm testing the training now with new data. The training process is otherwise the same as described above, but I'm getting an error. I'm assuming some audio file has a...

Suggestions wanted with training parameters

I fixed it now, I think there must have been a very very short audio fragment there, so I set the threshold higher for that in my processing script. Now...

Suggestions wanted with training parameters

Thanks for explanation, @oadams! I have few more questions: When I create the corpus, i.e. like this: ``` from persephone.corpus import Corpus corpus = Corpus(feat_type="fbank", label_type="phonemes", tgt_dir="experiment") ``` It will...