FullSubNet Does the Pretrained Model available in releases folder works with 48k sampling rate?

@haoxiangsnr

Hello,

FullSubNet model works with 48k sampling rate in inferencing time?

Regards Yugesh

Feb 22 '21 06:02 yugeshav

Hi, you need to downsample to 16K first.

Feb 23 '21 09:02 haoxiangsnr

Hi, you need to downsample to 16K first.

Does your model has any option to resample the audio data?

Feb 23 '21 10:02 yugeshav

Maybe you could use sox for resampling. Here is an example of how to do it:

sox filename.wav -r 16000 filename_16000.wav

Check this link for more info: https://stackoverflow.com/questions/23980283/sox-resample-and-convert

Feb 23 '21 11:02 haoxiangsnr

Sorry, I think you can directly use the FullSubNet model to enhance the 48K wav file in inferencing time.

Check this line of the project. When loading, Librosa will resample the wav file to 16K, regardless of the original sampling rate.

However, you should note that after enhancement, the saved wav file is 16K.

Feb 23 '21 11:02 haoxiangsnr

Sorry, I think you can directly use the FullSubNet model to enhance the 48K wav file in inferencing time.

Check this line of the project. When loading, Librosa will resample the wav file to 16K, regardless of the original sampling rate.

However, you should note that after enhancement, the saved wav file is 16K.

Thanks for the details, I tried inferencing 48k audio file and saved output in 16k, but observed quality of the speech is completely missed, sometimes no speech also. Is this expected behavior of your model?

Feb 23 '21 13:02 yugeshav

Could you please send me the wav file and the inference config?

Feb 24 '21 00:02 haoxiangsnr

Could you please send me the wav file and the inference config?

Input file uploaded in this link [https://drive.google.com/file/d/1UVejws8QuAtDWuA3cyCU6nMNp1Gv2E-L/view?usp=sharing]

Code changes are in config/inference/fullsubnet.toml

inherit = "config/common/fullsubnet_inference.toml" [dataset] path = "dataset.DNS_INTERSPEECH_inference.Dataset" [dataset.args] noisy_dataset = "/root/data_3tb_2/Experiments_Yugesh/Yugesh_FSN/FullSubNet-main/rc14_48k" limit = false offset = 0 sr = 48000

In src/inferencer/DNS_INTERSPEECH.py Line 162

op_dir = "/root/data_3tb_2/Experiments_Yugesh/Yugesh_FSN/FullSubNet-main/outputs" op_dir = op_dir + '/'+name+'.wav' sf.write(op_dir, enhanced, samplerate=16000)

Feb 24 '21 04:02 yugeshav

You will get the correct result by changing sr = 48000 to sr = 16000 in the inference/fullsubnet.toml, I presume?

Considering that sr = 48000, Librosa will load wav files by resampling the original sampling rate (in your case, 48K) to 48K (means no change). However, the pred-trained model is for wav files with 16K.

If you set sr = 16000, Librosa will load wav files by resampling the original sampling rate (in this case, 48K) to 16K.

Feb 24 '21 07:02 haoxiangsnr

You will get the correct result by changing sr = 48000 to sr = 16000 in the inference/fullsubnet.toml, I presume?

Considering that sr = 48000, Librosa will load wav files by resampling the original sampling rate (in your case, 48K) to 48K (means no change). However, the pred-trained model is for wav files with 16K.

If you set sr = 16000, Librosa will load wav files by resampling the original sampling rate (in this case, 48K) to 16K.

Okay, Then fullsubnet model only able to process 16k inputs. if we give 48k then librosa will take care of resampling conversion???

Thanks a lot for the detailed info @haoxiangsnr

Feb 24 '21 09:02 yugeshav

@yugeshav can you share the pretrained model ?

Mar 10 '21 08:03 ahmedbahaaeldin

The pre-trained model is in here: https://github.com/haoxiangsnr/FullSubNet/releases

On Wed, Mar 10, 2021, 2:08 PM ahmedbahaaeldin [email protected] wrote:

@yugeshav https://github.com/yugeshav can you share the pretrained model ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/FullSubNet/issues/7#issuecomment-795088846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHCASOR5LNQYQUONDLSU4DTTC4VXJANCNFSM4X74AECQ .

Mar 10 '21 10:03 yugeshav

@yugeshav which one from the archive/data file should i pick for the best performance ?

Mar 10 '21 11:03 ahmedbahaaeldin

As per the author, it is fullsubnet.

On Wed, Mar 10, 2021, 5:19 PM ahmedbahaaeldin [email protected] wrote:

@yugeshav https://github.com/yugeshav which one from the archive/data file should i pick for the best performance ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haoxiangsnr/FullSubNet/issues/7#issuecomment-795303936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHCASOVHRFM3YFNALIJBQZLTC5ME5ANCNFSM4X74AECQ .

Mar 10 '21 12:03 yugeshav

@yugeshav I changed the input to 16k sample rate , reshaped it to (1,1,257,-1) and forward through the network , the output shape is (1,2,257,-1) , is this the correct way to use it , cause the sound output is noise ? or their should be some preprocessing ?? @haoxiangsnr

Mar 10 '21 12:03 ahmedbahaaeldin

FullSubNet FullSubNet copied to clipboard

Does the Pretrained Model available in releases folder works with 48k sampling rate?

FullSubNet
FullSubNet copied to clipboard