djl icon indicating copy to clipboard operation
djl copied to clipboard

Incorrect Normalization Factor for 16-bit PCM Audio in grab Method

Open leleZeng opened this issue 8 months ago • 2 comments

Description

In the grab method, the normalization of 16-bit PCM samples is performed using:

Image

list.add(buffer.get() / (float) Short.MAX_VALUE); However, Short.MAX_VALUE is 32767, while the actual range of 16-bit PCM samples is [-32768, 32767]. This causes:

1.Asymmetry in normalization – The positive range is [-1, 1], but -32768 / 32767 ≈ -1.00003, slightly exceeding -1. 2.Potential value overflow – Some models expecting values strictly within [-1,1] might experience issues.

Expected Behavior

The normalization should ensure that all values strictly remain in the [-1,1] range.

How to Reproduce?

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. Process a WAV file with 16-bit PCM samples using the grab method.
  2. Observe that the minimum value slightly exceeds -1.

What have you tried to solve it?

Use 32768.0f instead of Short.MAX_VALUE for normalization:

Additional Context:

This issue affects models that expect perfectly normalized audio input, such as WebRTC VAD.

leleZeng avatar Mar 18 '25 06:03 leleZeng

@leleZeng Indeed this is bug. Would you mind create a PR to fix it?

frankfliu avatar Mar 18 '25 18:03 frankfliu

Fixed in PR #3646

leleZeng avatar Mar 19 '25 08:03 leleZeng