rnnoise icon indicating copy to clipboard operation
rnnoise copied to clipboard

Support for ffmpeg arnndn filter?

Open sarek9 opened this issue 4 years ago • 31 comments

Thanks for a really cool filter! Got lots of potential!

I've run a few trainings on my own material and have dumped the weights using

python dump_rnn.py weights.hdf5 ..\src\rnn_data.c filter.rnnn orig

But when I try to use the models in ffmpeg as

ffmpeg -i in.wav -af "arnndn=m=filter.rnnn" out.wav

I always get

Error initializing filter 'arnndn' with args 'm=filter.rnnn'
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0

Did I miss something or if not, what are the requirements of getting the models to work with ffmpeg? My models look a lot like those at https://github.com/GregorR/rnnoise-models but I'm still getting this error...

sarek9 avatar Feb 09 '21 20:02 sarek9

does it start with "rnnoise-nu model file version 1" ?

richardpl avatar Feb 09 '21 21:02 richardpl

Yes, am I missing something? You can download one of the models from https://drive.google.com/file/d/1Vc2vw-TF7gCAPwdYvdyS74iwNw1VatMW/view?usp=sharing

sarek9 avatar Feb 09 '21 21:02 sarek9

All models have 87526 words (as counted by vim) your file have hundreds more and ffmpeg parser does not skip excessive numbers so it errors on negative number because it check that number of entries are positive. Also file have floats values where it should not have.

richardpl avatar Feb 09 '21 22:02 richardpl

I also noticed the floats but it didn't change the outcome if I made them to int's. I don't quite follow you on the negative numbers though as this model (by Gregor Richards) https://drive.google.com/file/d/13KNjCkm6snQmpDE-E-sCTulRGQs3zbYx/view?usp=sharing also has negative numbers but still works with ffmpeg? Does the word count matter? (I haven't checked the ffmpeg code)

Does this mean that the rnnoise dump utility ultimately doesn't create compatible model files for ffmpeg?

sarek9 avatar Feb 09 '21 23:02 sarek9

Maybe utility adds random data at end of line. But first numbers says how array is big for each dimension. And linked file definitely have excessive entries. Does your model work with rnnoise code of this repo?

richardpl avatar Feb 09 '21 23:02 richardpl

Interesting, so the numbers at the beginning of each section describes the size for each dimension. I haven't checked if it works with the rnnoise utility. I'll have to get back to you on that but since it follows the direction of 1ch RAW PCM s16le 48kHz I'd expect it to work.

sarek9 avatar Feb 09 '21 23:02 sarek9

OK so I've tested my model with the rnnoise code of this repo and it works. It produces a valid RAW PCM file with noise reduced (as much as one can expect from 5 epochs).

sarek9 avatar Feb 10 '21 04:02 sarek9

So you have .c file of model? Can you share it?

richardpl avatar Feb 10 '21 10:02 richardpl

You can find my latest .c file here.

It seems as if there's something added to the GRU-layers as per the sections below.

24 24 0
3600 should be 3528 (3*24 words added)

90 48 2
20160 should be 20016 (3*48 words added)

114 96 0
61056 should be 60768 (3*96 words added)

sarek9 avatar Feb 10 '21 11:02 sarek9

I just added in arnndn filter code skip to new line. Feel free to try and report does it sounds correct with your model file(s). You might still convert doubles to ints because that is not standard format for models.

richardpl avatar Feb 10 '21 13:02 richardpl

Thank you! I'll check the next build and write back here once I've tested it. Strange that the rnnoise utility only sets certain sections to float in the RNN file.

sarek9 avatar Feb 10 '21 14:02 sarek9

I've built ffmpeg from master (verified your commit was the last one) but it still won't accept the model file built by me using rnnoise. Just to sum things up (maybe this is a bug in rnnoise) or not.

This model file built by Gregor Richards works.

This model built by me using the rnnoise dump utility (weights to model) doesn't work (I've removed the generated floats).

This is the .c file created by the rnnoise utility (which matches the model I've built, also with the generated floats removed).

There seems to be 3 extra entries to the matrix for the GRU-layers (although it looks like bias vectors at first it doesn't seem like it is). It would be interesting to hear what @jmvalin or @GregorR thinks!

sarek9 avatar Feb 10 '21 16:02 sarek9

Do not remove floats, just convert them to int. Aka remove .0 part. I will check if my changes work with dos line endings.

richardpl avatar Feb 10 '21 18:02 richardpl

It works even with dos line endings, so dunno what is now the problem...

richardpl avatar Feb 10 '21 18:02 richardpl

Sorry, was unclear... meant I converted the floats to ints...

sarek9 avatar Feb 10 '21 18:02 sarek9

Please upload that new file somewhere.

richardpl avatar Feb 10 '21 18:02 richardpl

Which file? My last uploads are mentioned above.

sarek9 avatar Feb 10 '21 18:02 sarek9

Well that file in above, i downloaded and compared with previous version and its different and also missing several lines/data.

richardpl avatar Feb 10 '21 18:02 richardpl

Aha, well it was just a newer model from a different training but still exported using rnnoise, so it should have the same basic structure. Are you saying it doesn't?

sarek9 avatar Feb 10 '21 19:02 sarek9

It is missing new lines so export to .rnnn file is buggy.

richardpl avatar Feb 10 '21 19:02 richardpl

I will look further into this tomorrow but I'm curious to know what reference implementation was used for the arnndn filter to start with? Which were the models tested?

sarek9 avatar Feb 10 '21 21:02 sarek9

All Gregor models including one from this repo works just fine.

richardpl avatar Feb 10 '21 21:02 richardpl

OK, so I've had some deeper look into this and it turns out it's the bias vectors for the GRU-layers after all. Sometimes the bias vectors are larger than 3*neurons (in my case twice). The offending row is this one in ffmpeg libavfilter/af_arnndn.c

#define INPUT_GRU(name) do { \
    INPUT_VAL(name->nb_inputs); \
    INPUT_VAL(name->nb_neurons); \
    ret->name ## _size = name->nb_neurons; \
    INPUT_ACTIVATION(name->activation); \
    NEW_LINE(); \
    INPUT_ARRAY3(name->input_weights, name->nb_inputs, name->nb_neurons, 3); \
    NEW_LINE(); \
    INPUT_ARRAY3(name->recurrent_weights, name->nb_neurons, name->nb_neurons, 3); \
    NEW_LINE(); \
    INPUT_ARRAY(name->bias, name->nb_neurons * 3); \    /* <-- Bias vectors can be larger than 3 neurons */
    NEW_LINE(); \
    } while (0)

I guess the rnnoise-nu model file format is a bit flawed since it should include bias vector lengths but in any case, maybe it's a good idea to add some code to scan until EOL (with some limits) as a work-around for the v1 file format?

sarek9 avatar Feb 11 '21 18:02 sarek9

That is what NEW_LINE does. But another file you provided is even more broken. Note that extra bias entries are not used by code or by rnnoise at all.

richardpl avatar Feb 11 '21 18:02 richardpl

So only bias vectors up to 3*neurons are used by ffmpeg? If that's the case then I guess the dump utility in rnnoise is flawed to not limit output to this.

sarek9 avatar Feb 11 '21 18:02 sarek9

Look at rnnoise code in this repo, ffmpeg arnndn filter code is derivation of it.

richardpl avatar Feb 11 '21 18:02 richardpl

I'm certainly not an expert on Keras but why would one want to skip certain bias vectors? Wouldn't it make sense to correct as much as possible in the chain of layers? Wouldn't this cause gradient descent to require more epochs (or worse)?

sarek9 avatar Feb 12 '21 23:02 sarek9

I have a hard time getting it to run in ffmpeg. It seems that the last version was updated on on Sep 2, 2018 (https://github.com/GregorR/rnnoise-models) Maybe it doesnt work with the latest ffmpeg? I am pretty new when it comes to using models. What I tried so far was

-af arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -f s16le -ac 1 -ar 48000 out.raw -af "arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -acodec pcm_s16le -ar 48000 -f WAV %1%.wav -af "arnndn=m='E:\rnnoise-models/somnolent-hogwash-2018-09-01/sh.rnnn'" -acodec pcm_s16le -ar 48000 -f WAV %1%.wav

also tried exporting it raw -f s16le -ac 1 -ar 48000 out.raw

but it all pretty much leads to the following error

Successfully opened the file. Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help [aost#0:0/pcm_s16le @ 0000028f074400c0] cur_dts is invalid [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream) [AVFilterGraph @ 0000028f0702e5c0] Setting 'model' to value 'E' [AVFilterGraph @ 0000028f0702e5c0] Setting 'mix' to value 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' detected 12 logical cores [Parsed_arnndn_0 @ 0000028f07482e40] [Eval @ 000000e8cd5fed40] Undefined constant or missing '(' in 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' [Parsed_arnndn_0 @ 0000028f07482e40] Unable to parse option value "rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn" Error applying option 'mix' to filter 'arnndn': Invalid argument Error reinitializing filters! Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #0:0 [AVIOContext @ 0000028f07481c00] Statistics: 0 bytes written, 0 seeks, 0 writeouts Terminating demuxer thread 0 [AVIOContext @ 0000028f07023a00] Statistics: 327714 bytes read, 3 seeks Conversion failed!

what am I doing wrong? Thanks in advance for any help

Metal-HTPC avatar Jul 29 '23 10:07 Metal-HTPC

Why you discuss ffmpeg filter in unrelated project?

richardpl avatar Jul 29 '23 10:07 richardpl

I thought that this is the related project as it was called "Support for ffmpeg arnndn filter?" What would be the right one then?

Metal-HTPC avatar Jul 29 '23 11:07 Metal-HTPC