rnnoise
rnnoise copied to clipboard
Support for ffmpeg arnndn filter?
Thanks for a really cool filter! Got lots of potential!
I've run a few trainings on my own material and have dumped the weights using
python dump_rnn.py weights.hdf5 ..\src\rnn_data.c filter.rnnn orig
But when I try to use the models in ffmpeg as
ffmpeg -i in.wav -af "arnndn=m=filter.rnnn" out.wav
I always get
Error initializing filter 'arnndn' with args 'm=filter.rnnn'
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0
Did I miss something or if not, what are the requirements of getting the models to work with ffmpeg? My models look a lot like those at https://github.com/GregorR/rnnoise-models but I'm still getting this error...
does it start with "rnnoise-nu model file version 1" ?
Yes, am I missing something? You can download one of the models from https://drive.google.com/file/d/1Vc2vw-TF7gCAPwdYvdyS74iwNw1VatMW/view?usp=sharing
All models have 87526 words (as counted by vim) your file have hundreds more and ffmpeg parser does not skip excessive numbers so it errors on negative number because it check that number of entries are positive. Also file have floats values where it should not have.
I also noticed the floats but it didn't change the outcome if I made them to int's. I don't quite follow you on the negative numbers though as this model (by Gregor Richards) https://drive.google.com/file/d/13KNjCkm6snQmpDE-E-sCTulRGQs3zbYx/view?usp=sharing also has negative numbers but still works with ffmpeg? Does the word count matter? (I haven't checked the ffmpeg code)
Does this mean that the rnnoise dump utility ultimately doesn't create compatible model files for ffmpeg?
Maybe utility adds random data at end of line. But first numbers says how array is big for each dimension. And linked file definitely have excessive entries. Does your model work with rnnoise code of this repo?
Interesting, so the numbers at the beginning of each section describes the size for each dimension. I haven't checked if it works with the rnnoise utility. I'll have to get back to you on that but since it follows the direction of 1ch RAW PCM s16le 48kHz I'd expect it to work.
OK so I've tested my model with the rnnoise code of this repo and it works. It produces a valid RAW PCM file with noise reduced (as much as one can expect from 5 epochs).
So you have .c file of model? Can you share it?
You can find my latest .c file here.
It seems as if there's something added to the GRU-layers as per the sections below.
24 24 0
3600 should be 3528 (3*24 words added)
90 48 2
20160 should be 20016 (3*48 words added)
114 96 0
61056 should be 60768 (3*96 words added)
I just added in arnndn filter code skip to new line. Feel free to try and report does it sounds correct with your model file(s). You might still convert doubles to ints because that is not standard format for models.
Thank you! I'll check the next build and write back here once I've tested it. Strange that the rnnoise utility only sets certain sections to float in the RNN file.
I've built ffmpeg from master (verified your commit was the last one) but it still won't accept the model file built by me using rnnoise. Just to sum things up (maybe this is a bug in rnnoise) or not.
This model file built by Gregor Richards works.
This model built by me using the rnnoise dump utility (weights to model) doesn't work (I've removed the generated floats).
This is the .c file created by the rnnoise utility (which matches the model I've built, also with the generated floats removed).
There seems to be 3 extra entries to the matrix for the GRU-layers (although it looks like bias vectors at first it doesn't seem like it is). It would be interesting to hear what @jmvalin or @GregorR thinks!
Do not remove floats, just convert them to int. Aka remove .0 part. I will check if my changes work with dos line endings.
It works even with dos line endings, so dunno what is now the problem...
Sorry, was unclear... meant I converted the floats to ints...
Please upload that new file somewhere.
Which file? My last uploads are mentioned above.
Well that file in above, i downloaded and compared with previous version and its different and also missing several lines/data.
Aha, well it was just a newer model from a different training but still exported using rnnoise, so it should have the same basic structure. Are you saying it doesn't?
It is missing new lines so export to .rnnn file is buggy.
I will look further into this tomorrow but I'm curious to know what reference implementation was used for the arnndn filter to start with? Which were the models tested?
All Gregor models including one from this repo works just fine.
OK, so I've had some deeper look into this and it turns out it's the bias vectors for the GRU-layers after all. Sometimes the bias vectors are larger than 3*neurons (in my case twice). The offending row is this one in ffmpeg libavfilter/af_arnndn.c
#define INPUT_GRU(name) do { \
INPUT_VAL(name->nb_inputs); \
INPUT_VAL(name->nb_neurons); \
ret->name ## _size = name->nb_neurons; \
INPUT_ACTIVATION(name->activation); \
NEW_LINE(); \
INPUT_ARRAY3(name->input_weights, name->nb_inputs, name->nb_neurons, 3); \
NEW_LINE(); \
INPUT_ARRAY3(name->recurrent_weights, name->nb_neurons, name->nb_neurons, 3); \
NEW_LINE(); \
INPUT_ARRAY(name->bias, name->nb_neurons * 3); \ /* <-- Bias vectors can be larger than 3 neurons */
NEW_LINE(); \
} while (0)
I guess the rnnoise-nu model file format is a bit flawed since it should include bias vector lengths but in any case, maybe it's a good idea to add some code to scan until EOL (with some limits) as a work-around for the v1 file format?
That is what NEW_LINE does. But another file you provided is even more broken. Note that extra bias entries are not used by code or by rnnoise at all.
So only bias vectors up to 3*neurons are used by ffmpeg? If that's the case then I guess the dump utility in rnnoise is flawed to not limit output to this.
Look at rnnoise code in this repo, ffmpeg arnndn filter code is derivation of it.
I'm certainly not an expert on Keras but why would one want to skip certain bias vectors? Wouldn't it make sense to correct as much as possible in the chain of layers? Wouldn't this cause gradient descent to require more epochs (or worse)?
I have a hard time getting it to run in ffmpeg. It seems that the last version was updated on on Sep 2, 2018 (https://github.com/GregorR/rnnoise-models) Maybe it doesnt work with the latest ffmpeg? I am pretty new when it comes to using models. What I tried so far was
-af arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -f s16le -ac 1 -ar 48000 out.raw
-af "arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -acodec pcm_s16le -ar 48000 -f WAV
%1%.wav
-af "arnndn=m='E:\rnnoise-models/somnolent-hogwash-2018-09-01/sh.rnnn'" -acodec pcm_s16le -ar 48000 -f WAV %1%.wav
also tried exporting it raw -f s16le -ac 1 -ar 48000 out.raw
but it all pretty much leads to the following error
Successfully opened the file. Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help [aost#0:0/pcm_s16le @ 0000028f074400c0] cur_dts is invalid [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream) [AVFilterGraph @ 0000028f0702e5c0] Setting 'model' to value 'E' [AVFilterGraph @ 0000028f0702e5c0] Setting 'mix' to value 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' detected 12 logical cores [Parsed_arnndn_0 @ 0000028f07482e40] [Eval @ 000000e8cd5fed40] Undefined constant or missing '(' in 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' [Parsed_arnndn_0 @ 0000028f07482e40] Unable to parse option value "rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn" Error applying option 'mix' to filter 'arnndn': Invalid argument Error reinitializing filters! Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #0:0 [AVIOContext @ 0000028f07481c00] Statistics: 0 bytes written, 0 seeks, 0 writeouts Terminating demuxer thread 0 [AVIOContext @ 0000028f07023a00] Statistics: 327714 bytes read, 3 seeks Conversion failed!
what am I doing wrong? Thanks in advance for any help
Why you discuss ffmpeg filter in unrelated project?
I thought that this is the related project as it was called "Support for ffmpeg arnndn filter?" What would be the right one then?