g2p-seq2seq Long words - right bucket size/parameters ?

Long words - right bucket size/parameters ?

Open entenbein opened this issue 4 years ago • 3 comments

Hi folks,

I trained a model for German and now I'm struggling with predicted output for longer words (e.g. 39 letters ≙ 34 phones, yeah German...). Meaning for the predicted words the last phones are repeated over and over again.

So for training I set max_length=50. The results got better but there are some phone repetitions still.

How do the other to bucket parameters influence the predicted transcriptions?

Thanks alot!

Jul 29 '20 08:07 entenbein

You'd better try something modern transformer architecture, not seq2seq.

Jul 29 '20 08:07 nshmyrev

Alright, which ones would you suggest?

Jul 29 '20 08:07 entenbein

Maybe https://github.com/hajix/G2P

Jul 29 '20 08:07 nshmyrev

g2p-seq2seq g2p-seq2seq copied to clipboard

Long words - right bucket size/parameters ?

g2p-seq2seq
g2p-seq2seq copied to clipboard