descript-audio-codec icon indicating copy to clipboard operation
descript-audio-codec copied to clipboard

(Paper Error?) MSD Not Used?

Open zaptrem opened this issue 1 year ago • 3 comments

Your paper says "Like prior work, we use multi-scale (MSD) and multi-period waveform discriminators (MPD) which lead to improved audio fidelity."

However, by default it appears the trainer has an empty rates array which means no MSDs are initialized. Was this intentional?

zaptrem avatar Nov 27 '23 20:11 zaptrem

@ritheshkumar95 @eeishaan could you check this issue? It looks like published configs have empty rates list => no one MSD discriminator is initialized.

References: https://github.com/descriptinc/descript-audio-codec/blob/main/conf/final/44khz.yml#L16 https://github.com/descriptinc/descript-audio-codec/blob/main/dac/model/discriminator.py#L203

Oktai15 avatar Dec 03 '23 15:12 Oktai15

@pseeth

Oktai15 avatar Dec 03 '23 15:12 Oktai15

Hey, sorry - was on paternity leave. I've also somehow lost access to @pseeth, so continuing on my new account.

Hmm, yes, that's correct - MSD is not used in the final version of DAC. We should probably have said that we investigated the use of MSD and MPD. However, if you look at the ablation section of the paper, in Section 4.5 under "Discriminator", we mention that the use of the MSD, or single-scale discriminator did not help. We retain MPD only. In the table, you can see a similar result - the top row (MPD, and no scale discriminator) out-performs the last row of the discriminator section.

However, good point that we don't have an ablation with both MPD and MSD on. I do remember trying it though and not seeing any significant difference, with the added drawback of slower training. Hope this helps!

seethlord avatar Dec 16 '23 22:12 seethlord