quyuquan2019

Results 4 issues of quyuquan2019

hi, i have a question .i use windows to run you code ,how to achieve you dataset ? because i see you dataset is .txt

请问 声学特征的输入 是一帧一帧放入,还是展成一维 放入? 还有就是 您的代码中输入层7774为什么这样定义?是最长一段语音 维度*帧长的结果吗?,其余的补零

I encountered a problem during training and I was puzzled for a day. According to your network structure, I use 230 speakers, input 's shape = [20000,400,24], I use RTX2080...

from you code xvector-gpu.ipynb , the net construction is :+1: Sequential( (0): TDNN() (1): TDNN() (2): TDNN() (3): TDNN() (4): TDNN() (5): StatsPooling() (6): FullyConnected( (hidden1): Linear(in_features=3000, out_features=512, bias=True) (hidden2):...