Pawel Swietojanski comments

Results 8 comments of


                                            Pawel Swietojanski

Question about distortion file

Hi, apologies for a delay with this. We did not release these data augmentation RIRs, instead you may use the 16kHz RIRs you can get from openslr page. The results...

self-supervised training from scratch

How much data do you train on? In our case, training for an epoch on 50 hours variant of librispeech took around 30 minutes (or perhaps under this) on a...

self-supervised training from scratch

Not sure how Google's Colab assigns resources, but ```--num-workers 16``` only makes sense if you have access to that many CPU cores (on top of a GPU). In that case...

self-supervised training from scratch

If it speeds things up, then sure (for our setup 16 was about OK). See what seem to be the best setting in your case (this is an overall balancing...

self-supervised training from scratch

Well, it's clear there is a large bottleneck somewhere. It's most likely IO related due to slow disk access (i.e. reading waves, rather than augmenting them later). Where do you...

self-supervised training from scratch

Thanks for reporting back on this. Do you have any way to tell the stats on how the machine is being used during training session? Ideally something along screen shot...

self-supervised training from scratch

Thanks. So one more thing you want to try is to limit each data loading thread to one CPU core. (at math algebra level) Now it looks like each thread...

self-supervised training from scratch

Looks like the overall system is much better balanced now (no race conditions, well loaded cores). How much data you pretrain on in this setup, 50 horus? You can see...