pyroomacoustics icon indicating copy to clipboard operation
pyroomacoustics copied to clipboard

room.plot_rir ignores "global_delay"

Open boeddeker opened this issue 2 years ago • 3 comments

Hey,

I compared pyroomacoustics with habets image method RIR generation code and was confused by two things:

  • attenuation
  • time of flight

download For the time of flight, I found the warning in the documentation (doc), image

but the later example in the documentation ignores it and also room.plot_rir ignores it. For the attenuation, I found nothing in the documentation, but maybe you used a different word to describe it.

It would be nice, if you could answer me the following questions:

  • Why does pyroomacoustics has a global delay? Sorry, if it is already in the documentation, but I didn't found it.
  • Why is the attenuation different from habets? habets looks more reasonable, because I never found there a RIR with values above 1.

Ideas for the documentation/code:

  • Consider the "global delay" in the example code and room.plot_rir.
  • Add a pointer in the documentation, why there is a "global delay" (Sorry, but I don't know, what a fractional delay filter is and hence I am not able to get the reason)

Here is an example that shows:

  • Example code from the documentation has a wrong x-axis.
  • room.plot_rir has a wrong x-axis.
  • The RIR can contain values above 1.
import pyroomacoustics as pra
import numpy as np

rt60 = 0.5  # seconds
room_dim = [4, 4, 3]  # meters

e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)
room = pra.ShoeBox(room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order)
room.add_source([1.5, 2, 1.5])
mic_locs = np.c_[[2, 2, 1.5]]
room.add_microphone_array(mic_locs)
room.compute_rir()

mic = room.mic_array.R[:, 0]
source = room.sources[0].position
dist = np.sqrt(np.sum(np.abs(room.mic_array.R[:, 0] - room.sources[0].position)**2))
print(f'mic: {mic}, source: {source}, dist: {dist}')

sound_velocity = pra.constants.get("c")
time_of_flight_in_seconds = dist / sound_velocity
time_of_flight_in_samples = time_of_flight_in_seconds * room.fs
global_delay = pra.constants.get("frac_delay_length") // 2

import matplotlib.pyplot as plt
plt.plot(room.rir[0][0], label='rir')
plt.plot([time_of_flight_in_samples, time_of_flight_in_samples], [-1, 1], label='time_of_flight')
plt.plot([time_of_flight_in_samples + global_delay, time_of_flight_in_samples + global_delay], [-1, 1], label='time_of_flight+global_delay')
plt.legend()
plt.gca().set_xlim([-10, 500])

plt.show()

room.plot_rir([(0, 0)])
plt.gca().set_xlim([-0.02, 0.05])

plt.plot([time_of_flight_in_seconds, time_of_flight_in_seconds], [-1, 1], label='time_of_flight')
plt.plot([
    time_of_flight_in_seconds + global_delay / room.fs, 
    time_of_flight_in_seconds + global_delay / room.fs
], [-0.3, 0.3], label='time_of_flight+global_delay')
plt.legend()

image

Note:

  • The from pyroomacoustics.directivities import ... from the documentation didn't work with my installation of pyroomacoustics.

boeddeker avatar Jan 14 '22 10:01 boeddeker

Hi @boeddeker , this has come up in some forms several times 😄 Thanks for taking the time to post your questions and suggestions here! Suggestions for improving the doc are also welcome!

Fractional Delay Filters

First, let me describe briefly what a fractional delay filter is. We start in continuous time; the echoes arrive at the microphone at real valued delays. However, practical requirements means that all we can compute is the RIR at discrete values. One way to do that is to low-pass at fs / 2 and then sample at fs. Let us use the perfect low-pass filter, i.e. the sinc function. Note that we can actually compute this exactly for each echo by shifting the sinc by the delay and compute its value on the sampling grid. We note two things.

  1. If the delay is an integer, all the sampling grid points coincides with zeros of the sinc function! A "perfect Dirac"!
  2. If the delay is not an integer, the sinc value is non-zero at all the sampling grid points (both before and after the delay). Now, computing this exactly is not very practical because we would need to truncate the sinc anyway at some value, and in addition, we need to compute the sinc for all grid points, for all echoes. So we apply a (Hann) window to the sinc to limit it to -frac_delay_length, frac_delay_length interval. This means that this is not a perfect low-pass filter anymore, but things are easier to compute, and we only lose a little bit of high frequency. While the filter is non-causal, it is nice because it is linear phase.

In pyroomacoustics, computing all the sinc functions was actually taking most of the computation time, so I replaced it by a look-up table with linear interpolation. Again, this should not introduce too much distortion.

Global Delay

The reason for the global delay is due to the fractional delay filter implemented as a windowed sinc. The filter has ripples in past samples for pra.constants.get("frac_delay_length") // 2 as indicated in the documentation and explained above. If the time of flight is less than that (40 samples, i.e. source-mic distance of approx. 85 cm at 16 kHz), then the beginning of the filter will have an index smaller than zero. This is why I decided to have the global offset. I am not sure what the length of the fractional delay filter is in Habets' code. If you want a shorter delay, you can adjust the constant before simulation (e.g. pra.constants.set("frac_delay_length", 20)).

When you say that the plot is wrong in the example, I guess you mean the time should start at -40 samples, correct ? Or is there any other error ?

Of course we could shift all RIR by 40 samples back, but this would truncate the sinc in some cases and this would introduce some high frequency artefacts. I think for most purposes, the global delay is not really an issue, and when it is, we can simply shift all timings by 40 / fs.

Amplitude

The reason for the different amplitude is that I have decided to not divide by 4 \pi. The relative amplitudes of the RIR are correct as far as I know and a global scaling is not very relevant for audio signals I think.

Also, not that the amplitude is divided by the source-mic distance so that you could get arbitrarily large amplitudes by bringing the source and mic very close. In practice you will always have the physical size of the source/mic limiting the minimal distance, but even like this, limiting the amplitude be smaller than one seems an arbitrary choice.

Directivities

I tried on my local 0.6.0 installation (from pip) and it seems to work... Do you have more details ?

In [1]: from pyroomacoustics.directivities import Directivity

In [3]: import pyroomacoustics as pra

In [4]: pra.__version__
Out[4]: '0.6.0'

In [5]: from pyroomacoustics.directivities import (
   ...:     DirectivityPattern,
   ...:     DirectionVector,
   ...:     CardioidFamily,
   ...: )

In [6]: dir_obj = CardioidFamily(
   ...:     orientation=DirectionVector(azimuth=90, colatitude=15, degrees=True),
   ...:     pattern_enum=DirectivityPattern.HYPERCARDIOID,
   ...: )

fakufaku avatar Jan 14 '22 13:01 fakufaku

First, thank you for the explanation and taking the time to write a long response.

Fractional Delay Filters and Global Delay

The fractional delay filters and global delay is now clear to me.

... This is why I decided to have the global offset. I am not sure what the length of the fractional delay filter is in Habets' code.

I think keeping the values is a good decision. In Habets code, the first peak matches the time of flight. So I guess the values are ignored. Maybe this rirgen/rirgen.h#L193 is the code for the length of the fractional delay filter in Habets code. So the global offset would be 32 or 64 samples, depending on the sample rate (8 kHz vs 16 kHz).

When you say that the plot is wrong in the example, I guess you mean the time should start at -40 samples, correct ? Or is there any other error ?

No, I only observed the wrong starting point. For some experiments, I want to switch from Habets to pyroomacoustics and before doing so, I wanted to check the differences. In our simulation code, we compensate the time of flight based on a peak estimate, so the global delay shouldn't introduce issues.

Amplitude

Thank you, with the 4 \pi the scale looks now identical:

image

Also, not that the amplitude is divided by the source-mic distance so that you could get arbitrarily large amplitudes by bringing the source and mic very close. In practice you will always have the physical size of the source/mic limiting the minimal distance, but even like this, limiting the amplitude be smaller than one seems an arbitrary choice.

I never checked what are reasonable values for a RIR, I simply sometimes plot RIRs and observed that the values are smaller than one. I just thought, if I play a signal and then record it at another position, it shouldn't be amplified.

... and a global scaling is not very relevant for audio signals I think.

In many cases I agree, but sometimes the scaling is important. For example, when you want to train on simulated data and evaluate on real data. Or more general, you have experiments with multiple RIR source (e.g. pyroomacoustics, Habets, measured, rerecordings, ...).

Directivities

Sorry, my fault. Everything is fine. I installed the pyroomacoustics in a new environment and executed the code in an old one. I only checked the folder names on GitHub and because of the warning "As of Sep 6, 2021, setting directivity patterns for sources and microphone is only supported for the image source method (ISM). Moreover, source direcitivities are only supported for shoebox-shaped rooms." I thought the code was moved.

Action point

Should I prepare a PR? Fixing the plot function and the plot code in the documentation shouldn't be difficult. I could also add a small comment to the global_delay, but since you wrote

this has come up in some forms several times

It may be better to write some more text and for that, I think you are better qualified. Or I could add a pointer to your answer in this issue.

boeddeker avatar Jan 14 '22 15:01 boeddeker

Thanks for your patience! My reply is overdue!!! 😅

If you are willing to do a PR to fix the example that would be fantastic!

I also think it may be nice to add some more explanation about the delay in the documentation, but it is also fairly technical, so if you have a suggestion of the right place to do it that would be welcome.

And, just a quick note about the scale. I think that especially for DNNs you should not rely on the global scale of the input signals. Even when only using recorded signals, the gain of the amplifier may be set differently for different recordings or train/test time. So in this case it is more realistic to have a systematic normalization of the input signals.

fakufaku avatar Jan 31 '22 14:01 fakufaku