deep-voice-conversion
deep-voice-conversion copied to clipboard
Question on train1.py's and train2.py's runtimes.
Hi,
I'm trying this out, but I've only a gtx 960 gpu. I was wondering what kinds of training run times do you have for these 2? Will it take me weeks, days, hours? When I started python train1.py, A few lines of logging came out, but seems to have frozen already, with no sign of life nor error.
I hope you can give me an idea what to expect.
Thank you very much, =)
I actually don't know how it compares to the GTX 960, but I was using a p2.xlarge (which is half of an nVidia K80 Tesla) on AWS, and train1 took about 72 hours to converge on 70% accuracy.
@jswilson would you mind sharing how long train2 took you, if you tried it on the arctic dataset?
I've been running Train2 for over a day now and my results are more erratic than a mountain goat on LSD, so I'm a tad concerned that it'll never converge.
@VictoriaBentell any news? did it converged?
No, I assume that it's similar to this issue. It seems like the arctic dataset itself might be the problem, and so I'm going to try and find another dataset to replace that.
@VictoriaBentell May I ask when you were using train2, did you use your own voice corpus? If so, how did you do that?
Originally I was using the arctic dataset, but I haven't done anything with it since my last response, so I'll try making my own corpus and get back to you on the results. As far as I'm aware, it should be as straightforward as replacing each wav file in bdl and slt with any source and target respectively. As long as the source and target are saying the same things for 3 or 4 seconds each, the result should be fine... I hope.
@VictoriaBentell May I ask if your model converges now? Did it work out using arctic dataset? If you have train2 result, could you please share it with me for a try?