RTNeural
RTNeural copied to clipboard
Add support for NAM files
Hi Jatin,
how hard would it be to add support for parsing the NAM file format?
https://github.com/sdatkinson/NeuralAmpModelerCore
Just from a quick peek at both sources the required layers are almost there (except for the wavenet layer which seems like a high level abstraction of existing low level layers).
I would love to avoid adding too many neural network engines to my project so if you think itβs doable Iβll give it a shot.
@christoph-hart I've created a project were I implemented both engines. So users could load nam and json/aidax files without take care which one to load. Implementation of both engines is straight forward. https://github.com/brummer10/Ratatouille.lv2
Thanks for the input and adding both engines is definitely an option but I would love to avoid adding the big fat Eigen library and RTNeural is already in there with what looks to me 95% of the required feature set.
Hi All!
I think it should be possible to construct a NAM-style model using RTNeural's layers. If I remember correctly NAM uses a "Temporal Convolutional Network", and I have implemented a couple of those in the past using RTNeural's layers, although there are sometimes variations between those types of networks. Here's an example of a "micro-TCN" implementation that we use as part of RTNeural's test suite. Probably the best route forward would be to use that implementation as a starting point, add whatever might be missing from the NAM model architecture, and probably adapt the mechanism for loading model weights to match whatever format NAM models use to store their weight. I'd be happy to help with this process as my time allows.
That said, I'm not sure it would make sense to add support for NAM models directly to RTNeural, since I think it falls a little bit outside the scope of what RTNeural does. I do have some future plans for a sort of "model library" which would have example implementations of several neural network architectures that are commonly used in real-time audio (and maybe other real-time domains as well), and I think having NAM models as part of the model library would be great. However, there's some other changes I want to make to RTNeural before starting on that, so it may be a while before I get there.
Also interested in this. https://github.com/sdatkinson/NeuralAmpModelerCore/issues/49
maybe relevant? https://github.com/Chowdhury-DSP/BYOD/issues/363
Thanks for the input and adding both engines is definitely an option but I would love to avoid adding the big fat Eigen library and RTNeural is already in there with what looks to me 95% of the required feature set.
Just out of curiosity I checked if we could build NeuralAmpModelerCore against the Eigen library comes with RTNeural, and yes, it works flawless. We could even share the jsaon header.
My 2 cents: we want to avoid boilerplate code on engine side or even inside plugin, i.e. we don't really want RTNeural to have methods to parse .nam (or .aidax or whatever) model files (torch weights), we want to adjust the model file that is coming out of a training repo and port to the format used by RTNeural. For example: Automated-GuitarAmpModelling repo uses torch and creates a model file that is not directly supported by RTNeural, mostly because RTNeural is using keras implementation as a reference. I've created this script that simply adapt the model file from Automated-GuitarAmpModelling into what's expected on RTNeural side. The same could be done for .nam.
Since the two systems use mostly the same model types, I think NAM would have a very big incentive implementing RTNeural. It would remove 90% of the complexity from: https://github.com/sdatkinson/NeuralAmpModelerCore/tree/main/NAM
The only blocker seems WaveNet is not implemented (yet) in RTNeural
Coming back to this as the request from my users keeps popping up.
The only blocker seems WaveNet is not implemented (yet) in RTNeural
I've tried to naively port over the wavenet.h and wavenet.cpp files from here to use the Layer<T> interface class and it looks like a very simple copy & paste job for someone who is proficient with either one of the libraries, however it's outside my comfort zone to offer a serious contribution here. I think we can even stay at a high level that doesn't even require branching into the different backends (Eigen, XSIMD, etc) since it just combines the existing layers.
I've created this script that simply adapt the model file from Automated-GuitarAmpModelling into what's expected on RTNeural side. The same could be done for .nam.
Yes, that could also work, no ambitions here to bloat up the RTNeural project from my side. Whether it's a python script or a conditionally compileable C++ class shouldn't make a big difference in the end (I would prefer the latter, but that's something I can add to my project then).
Hi All,
Sorry for the long delay, I've been a bit busy the past few months. I had a look at implementing one of the NAM architectures in RTNeural a little while ago, and was able to make some progress, but haven't gotten it fully working just yet.
The main issue with re-exporting the weights of a NAM model into RTNeural's JSON format is that RTNeural's JSON format currently only supports "sequential" models, and I don't believe the WaveNet architecture is sequential.
Hopefully I can finish up the WaveNet architecture sometime in the next week or two. That said, in order to fully utilize RTNeural's performance capabilities, it would be preferable to be able to know the network architecture at compile-time, which could pose a problem if the intent is to create a "generic" NAM model loader. I have some ideas about this, but I'll worry about that after the basic implementation work is done.
Thanks, Jatin
That said, in order to fully utilize RTNeural's performance capabilities, it would be preferable to be able to know the network architecture at compile-time, which could pose a problem if the intent is to create a "generic" NAM model loader.
The vast majority of NAM models use the "standard" architecture. There are also three other, less commonly used, official WaveNet presets. Very, very few models will use any other architecture.
If both compile-time and dynamic architectures are supported, then compile-time architectures can be provided for the official presets, along with a dynamic fallback for less common architectures.
Aida-X could be the right approach; there is no need for thousands of model types so that architectures can be hard coded. LSTM is right for 99.5% of use cases.
The rationale is most users download their models from Tonehunt.
A script could remap Wavenet NAM files and retrain to LSTMs; if the system is optimized on a GPU, converting everything on ToneHunt in a reasonable amount of time could be possible.
This way, RTNeural does not need to change anything, nor NAM. We need a fat GPU for a month :)
I just made a Proof of concept: https://github.com/baptistejamin/nam-to-rtneural, I haven't tested the result yet, but that should be compatible
Would that script preserve the expected sample rate of the model? NAM supports whatever sample rate your input and output pair is, whereas the Aida-X script forces 48,000hz. Also does this shrink the model? In my testing NAM is more accurate than the Aida-X models.
I tested with two different models (clean and high gain), and it sounded exactly the same. RTNeural is so optimized that the LSTM model size could be increased a lot, but I don't think it's required to do this. I own a Kemper, and to be an LSTM of size 16 does a way better job.
Yes, 48kHz sampling rate is required; however, the way NAM handles this is by resampling. It's not actually the model that does this, but the Plugin host handling this.
The script does not actually shrink the model. It just generates sound from the NAM core using a NAM model, and then retrains to an LSTM compatible with RTNeural (Aida-x implementation).
So at the end, you have an LSTM model that is compatible with NAM + with RTNeural.
RTNeural Models are running 10x faster, enabling running models on lower-end CPUs / or hardware such as Raspberry Pi, with super low latency, and with a lot of extra CPU cycles available to run effects, cab simulation, etc.
Most nam models are using Wavenets lately, while RTNeural models will be LSTMs, this is the key difference. Originally NAM was only LSTM based, and people were pretty happy with it ;)
Ok that's good to know. For a lot of my projects I'm cool with LSTM, however I often want to stack models at various points of the DSP chain and "oversampling" by feeding high sample rate models reduces aliasing, which isn't an issue with a single model, but does add up when you start sequencing them.
Just a clarification with how NAM works currently. If you feed the trainer with say 96khz files, it trains the model based on this sample rate, notes the sample rate in the meta data, then NAM will resample the incoming audio to that sample rate. The model itself requires the audio to be at the sample rate of the input / output files, whether that's 48, 96, 192 etc, because the weights are all based on that sample rates. Higher sample rate models show reduced aliasing which as I mentioned above does have reasons to exist.
So your script actually generates a model of a model? That would introduce even more loss wouldn't it? As now you are a further step away from the original capture files. Isn't this similar to converting an AAC into an Mp3?
Is there any reason we can't train LSTM models at 96 or 192khz and have RTNeural interpret those models? Having tried both NAM and AIDA-X I've got to say, training NAM models is a lot easier, with many more options. This is what makes me think it is worth implementing NAM models rather than just converting them.
Anyway, I don't mean to sound ungrateful, it's actually nice to have a pretty hands off way to re-train NAMs to Aida-X.
We could retrain to 96khz if it's something you are willing to explore. We can do this.
Implementing NAM models in RTNeural, and RTNeural in NAM seems out of scope. Unless RTNeural implements wavelets. RTNeural is just a core system for Machine learning, while NAM is dedicated to guitar amps and re-implements models in pure CPP, but in a less optimized way.
The more optimized the inference engine is, the more powerful models we can get, allowing more layers, etc.
For instance, with RTNeural, it could be possible (and Keith Bloemer already did) also to capture knobs effects. For instance to simulate a Fuzz pedal with Stab, Gain, etc. It's 's something that is not possible with NAM, and that will likely never be.
Is there a better place to continue this discussion that isn't clogging up the issue token?
Is there a better place to continue this discussion that isn't clogging up the issue token?
I'd be happy to open a channel/thread on the RTNeural Discord if people want to chat more on there?
Also, I know I've been saying this for a while, but I think I should finally have time this weekend to finish my NAM-style WaveNet implementation in RTNeural... we'll see how it goes!
Sounds good, I'm already on the Discord.
Also, I know I've been saying this for a while, but I think I should finally have time this weekend to finish my NAM-style WaveNet implementation in RTNeural... we'll see how it goes!
π I'm happy to test integrating it as soon as you've got something functional.
@baptistejamin @rossbalch for AIDA-X related topics in this thread you may be interested in moving here
https://github.com/AidaDSP/Automated-GuitarAmpModelling/issues/9
To all: once RTNeural engine has the support for Wavenet (and elaborations of it), then I can expand my current script to generate a json model for RTNeural. I still think it's the best way to support it, but as @jatinchowdhury18 pointed out, support for this arch needs to be implemented in the engine (I thought this was immediately possible since conv1d layers are already present in RTNeural, I was wrong). So the scenario could be having a python script with only torch as major dep, that does:
- Automated-GuitarAmpModelling to RTNeural β
- Automated-GuitarAmpModelling (Aida DSP fork) to RTNeural β all models variants supported in AIDA-X β
- Automated-GuitarAmpModelling (GuitarML fork) to RTNeural β
- NAM to RTNeural π¨
- ToneX to RTNeural π€ π€ π€
For example, .aidax models on ToneHunt are just RTNeural compatible json files with the extension changed from .json to .aidax and a metadata section added as requested by ToneHunt. I would love to see something like this happen, if you have other ideas let's discuss on RTNeural Discord, and thanks again @jatinchowdhury18 for bringing this outstanding engine to life!
ToneX would be an interesting one. I think their weights are encrypted based on my poking around in the SQL library.
Okay, I finally have something useful to share on this. Thanks all for your patience. I've put together a repo with a demo of a NAM-style WaveNet model implemented in RTNeural: https://github.com/jatinchowdhury18/RTNeural-NAM
At the moment there seems to be some discrepancies between NAM's convolution layer and RTNeural's, so I'll need to debug that. There's also some missing bits (e.g. gated activations), but I don't think those should be too hard to add, now that the base implementation is in place.
The main "issue" I'm imagining for people wanting to use this is that the RTNeural model needs to be defined at compile-time, with parameters taken from the model configuration. For example:
wavenet::Wavenet_Model<float,
RTNeural::DefaultMathsProvider,
wavenet::Layer_Array<float, 1, 1, 8, 16, 3, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512>,
wavenet::Layer_Array<float, 16, 1, 1, 8, 3, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512>>
rtneural_wavenet;
That type definition could be auto-generated without too much trouble, but that doesn't help much if you're planning to load the model at run-time. The RTNeural-Variant repo shows one way to deal with this issue, but it may not work well in this instance given how many parameter configurations NAM's WaveNet supports.
On the bright side, in my test example, the RTNeural implementation is running ~2x faster than the NAM implementation on my M1 Macbook. Of course this isn't a fair comparison since the RTNeural implementation isn't correct yet, but it's a good sign! So far I've been using RTNeural's Eigen backend, but I'd love to get the wavenet working with the XSIMD backend as well to see if that might run a bit faster.
Anyway, if anyone's got some time and wants to help with testing/debugging the RTNeural WaveNet, feel free to jump in to the other repo. I'm hoping to have time to get back to it later this week or next weekend.
XSIMD backend
I think would be a blast, at the same time from my experience it seems this largely depends on toolchain and target arch. For example I get zipper noise with XSIMD on Mod Dwarf, I cannot compile at all on Chaos Audio Stratus. I get XSIMD working fine bulding AIDA-X in Yocto. So I would be happy to see it running in Eigen, then of course the more the better!
On the bright side, in my test example, the RTNeural implementation is running ~2x faster than the NAM implementation on my M1 Macbook.
That is definitely encouraging!
What Tanh implementation are you using? When we switched to using a Tanh approximation in NAM it made a huge performance difference.
What Tanh implementation are you using? When we switched to using a Tanh approximation in NAM it made a huge performance difference.
At the moment, the RTNeural implementation is using Eigen's built-in tanh() method, which I think is what NAM uses by default as well. The idea with the MathsProvider template argument is to make it easy to drop in different implementations of tanh() (or other activation functions), without any run-time cost. However, for the moment I'd prefer to use the same operations on both sides (to the extent possible), just to make it easier to compare both accuracy and performance.
At the moment, the RTNeural implementation is using Eigen's built-in
tanh()method, which I think is what NAM uses by default as well.
The NAM fast tanh is optional, but the official plugin has been enabling it since I added the option:
https://github.com/sdatkinson/NeuralAmpModelerPlugin/blob/feafd19ffa025c1c54e51f626cb9b2cf64cc5cd4/NeuralAmpModeler/NeuralAmpModeler.cpp#L73
At the time, applying the tanh activation function was the top hit in the hot path and switching to the fast tanh approximation gave about a 40% performance improvement.
Building on Windows (Visual Studio x64 Release) I get:
RTNeural is: 3.95238x faster
If I enable fast tanh for NAM, I get:
RTNeural is: 2.12809x faster
PR is here to fix the model weight loading:
https://github.com/jatinchowdhury18/RTNeural-NAM/pull/1