Steve Korshakov comments

Results 169 comments of


                                            Steve Korshakov

Can you provide a readme?

Actually I am almost finished NAR model training, works really well for in-domain samples. Also you can download pre-converted datasets using my dataset tool.

Can you provide a readme?

My librilight-preprocessed is my naive attempt to transcribe it, but it is a failed one - too many errors and networks trained on it turned out to have too much...

Can you provide a readme?

They should be exactly the same, all my work is reproducible! So it is up to you.

Can you provide a readme?

I have finished the training, published the results. Networks follows the speaker much better than Voicebox, but still not that good as should be for out of domain speakers.

Can you provide a readme?

This is a zero-shot voice cloning network, nothing to train here, just 3-5 second clean sample with text

Text to unit training code

I have opted to BigVSAN - i was really impressed by it's quality, i wasn't to spot any difference from synthesized and real audio on my datasets. I have published...

Text to unit training code

i am training on quite small dataset- libritts-r + vctk. They have only high quality voice, but i want to try to do some pre-training on much bigger one to...

FA2' flash_attn_varlen_func is 300x slower than flash_attn_func

Changing to 16x16 head dimensions reduces gap to 10x, but still very slow.

FA2' flash_attn_varlen_func is 300x slower than flash_attn_func

@tridao Thank you for catching that, after the fix it is still 4x slower than flash_attn_func: xformers (mask) 0.00034342713200021533 xformers (no mask) 0.0013367030000081285 torch (mask) 0.0034441131959902123 torch (no mask) 0.0013596494959783741...

FA2' flash_attn_varlen_func is 300x slower than flash_attn_func

@zhangjun Thanks! But i am running this code in notebook and repeating cell execution yields similar results.