open-unmix-pytorch
open-unmix-pytorch copied to clipboard
Running umxhq on a large test track (Georgia Wonder - Siren) blows up memory >64GB
Running the umxhq separator with the default wiener separation (niter=1) really blows up my memory usage when I run umx with the CPU. Is it really supposed to do that?
I could swear this used to run fine before, and I never had more than 64GB of RAM. It sounds like a conspiracy but I wonder if some possible ffmpeg version upgrade could be silently causing more memory to be used?
I just saw the other suggestion to do inference in 30s-sized chunks so I'll do it that way.
hmmm, could you check whether the batch_size
parameter inside the expectation_maximization
method of filtering.py
is being used ?
If not, it means that the system is trying to process the whole track, which may be the source of the problem
Yes, it is being used (the default of 200).
ok, and when you have 0 iterations, it works fine ?
Yes, umxhq(device="cpu",niter=0)
works well. The total memory usage is at 29GB, while with niter=1, it grows to >64 and gets killed. I guess this is a duplicate of https://github.com/sigsep/open-unmix-pytorch/issues/7 which is my bad
I'm just surprised because it's the first time I had an issue running a full evaluation.
ok. Oh I guess I should fix the memory usage
what's the length of the track ?
If you would like, I can take a look with memory_profiler and see if I can create any savings to contribute to this project?
Song looks like it's 7:10:
(nsgt-torch) sevagh:nsgt $ mpv /run/media/sevagh/windows-games/MDX-datasets/MUSDB18-HQ/test/Georgia\ Wonder\ -\ Siren/mixture.wav
(+) Audio --aid=1 (pcm_s16le 2ch 44100Hz)
AO: [pulse] 44100Hz stereo 2ch s16
A: 00:00:00 / 00:07:10 (0%) Cache: 429s/81MB
well ok we could do that together ! Thanks
(I'm not super available these times, but I'm curious about it). Normally this batchsize parameter should be saving quite a lot of ram, so if you may profile as a start to see what are the tensors that are exploding ?
Are you tracking gradient btw ?
I just tried disabling grad on my audio tensor, didn't save much.
Some heavy lines from my profiling:
278 21639.691 MiB 1933.609 MiB 30 v = torch.mean(torch.abs(y[..., 0, :]) ** 2 + torch.abs(y[..., 1, :]) ** 2, dim=-2)
307 21639.691 MiB 0.000 MiB 54 Cxx = regularization
308 21639.691 MiB 0.000 MiB 270 for j in range(nb_sources):
309 21639.691 MiB 3472.941 MiB 216 Cxx = Cxx + (v[t, ..., j, None, None, None] * R[j][None, ...].clone())
332 48965.359 MiB 3347.324 MiB 516 gain = gain * v[t, ..., None, None, None, j]
333
334 # apply it to the mixture
335 48965.359 MiB -2756.098 MiB 1548 for i in range(nb_channels):
336 48965.359 MiB 8034.758 MiB 1032 y[t, ..., j] = _mul_add(gain[..., i, :], x[t, ..., i, None, :], y[t, ..., j])
I thought I could be smart and only apply Wiener on max_bin = bandwidth_to_bin(16000)
. It saves ~5-10 GB of memory but loses a bit of SDR.