2024-ICLR-READ icon indicating copy to clipboard operation
2024-ICLR-READ copied to clipboard

Inquiry about Reproducing Audio Corruption Results (Discrepancy with Noise File-Based Corruptions)

Open PolyuAlistair opened this issue 1 month ago • 0 comments

Thank you for sharing your code and for your excellent work. I am writing because I am trying to reproduce the audio corruption experiments from Table 2 on the VGGSound dataset and have encountered a specific issue regarding the noise files.

  1. Observation: Format Mismatch and its Impact I noticed that the original VGGSound audios are mono, 16 kHz, while the provided noise files (e.g., rain.wav, thunder.wav) are stereo, 48 kHz. When using make_c_audio.py, the output corrupted files inherit the higher sample rate and channel format of the noise. This alters the audio duration and content, which is then significantly truncated when run_read.py pads/truncates to 1024 frames.

  2. Key Evidence: Gaussian Noise vs. File-Based Noises (1)To address this, I converted the five external noise files (rain, thunder, etc.) to mono, 16 kHz to match the original audio format. Here is the critical finding: (2)The result for gaussian_noise (which is generated algorithmically and does not rely on an external file) is very close to the value reported in your paper. (3)However, the results for the other five corruptions that rely on external noise files (especially rain and thunder) show a significant discrepancy from the reported values. In fact, for rain and thunder on VGGSound, the accuracy for the READ action in my experiments is over 5% lower than the accuracy for the NONE setting, which seems counter-intuitive.

  3. Request for Clarification The fact that only gaussian_noise reproduces correctly strongly suggests that the issue lies in the processing of the external noise files. Could you please provide some clarification on the intended setup for these file-based corruptions? It would be extremely helpful if you could specify: (1)Noise File Preparation: Were the original stereo/48kHz noise files used directly in your experiments, or was there a down sampling/mono conversion step? (2)Mixing Parameters: What were the exact parameters used for mixing the other noises? Was it different from the sigma used for Gaussian noise? (3)Exact Command: If possible, could you share the exact command-line parameters used for these corruptions in Table 2?

Your guidance on this matter would be invaluable for correctly reproducing your results. Thank you very much for your time and support.

Image

PolyuAlistair avatar Nov 11 '25 14:11 PolyuAlistair