rave_vst
rave_vst copied to clipboard
VST / Standalone can't load models that have the latent-size cropped
Hi ! first, thx for sharing all this work ! it's so fun to play with those models.
I have an interesting one:
When training a model using --cropped-latent-size 8
, training / exporting and combining works just fine. But the VST/Standalone will fail to load this model, as the latent size is still expected to be 128:
[-] Network - No API response
[ ] RAVE - Encode parameters 1
1
8
2048
[ CPULongType{4} ]
[ ] RAVE - Decode parameters 8
2048
2
1
[ CPULongType{4} ]
[ ] RAVE - Prior parameters 1
2048
8
2048
[ CPULongType{4} ]
[ ] RAVE - Latent size 128
[ ] RAVE - Sampling rate: 48000
[+] RAVE - Model successfully loaded: /Users/nunja2/Library/ACIDS/RAVE/secondy.ts.ts
- sr : 48000
- latent size : 128
- full latent size : 128
- ratio2048
- prior parameters 1
2048
8
2048
[ CPULongType{4} ]
to low; setting rate to : 11
libc++abi: terminating with uncaught exception of type std::runtime_error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__.py", line 19, in decode
x: Tensor) -> Tensor:
_rave = self._rave
return (_rave).decode(x, )
~~~~~~~~~~~~~ <--- HERE
def encode(self: __torch__.Combined,
x: Tensor) -> Tensor:
File "code/__torch__/___torch_mangle_0.py", line 31, in decode
latent_pca = self.latent_pca
_0 = torch.unsqueeze(torch.numpy_T(latent_pca), -1)
z1 = torch.conv1d(z, _0)
~~~~~~~~~~~~ <--- HERE
latent_mean = self.latent_mean
z2 = torch.add(z1, torch.unsqueeze(latent_mean, -1))
Traceback of TorchScript, original code (most recent call last):
File "combine_models.py", line 36, in decode
@torch.jit.export
def decode(self, x):
return self._rave.decode(x)
~~~~~~~~~~~~~~~~~ <--- HERE
File "export_rave.py", line 159, in decode
def decode(self, z):
if self.trained_cropped: # PERFORM PCA BEFORE PADDING
z = nn.functional.conv1d(z, self.latent_pca.T.unsqueeze(-1))
~~~~~~~~~~~~~~~~~~~~ <--- HERE
z = z + self.latent_mean.unsqueeze(-1)
RuntimeError: Given groups=1, weight of size [8, 8, 1], expected input[2, 128, 1] to have 8 channels, but got 128 channels instead