odas icon indicating copy to clipboard operation
odas copied to clipboard

Non-directional microphones, raw files, and synthetic sound

Open oleg-alexandrov opened this issue 5 years ago • 9 comments

Dear developers,

I have been very impressed by the ODAS software and documentation. I read the wiki page on the configuration. Some things that are still not clear to me are:

  • What direction to specify for non-directional microphone? Is (0, 0, 0) fine?

  • If something could be said about the RAW files. Are they binary or text? Are the values float32, float64, or something else?

  • Is there any tool to create synthetic RAW files given n sources and m microphones?

I understand that especially the last question is a bit out of scope of your project. I would appreciate any information you can provide (and hopefully it can be added to the wiki page). Thank you.

oleg-alexandrov avatar Mar 12 '19 22:03 oleg-alexandrov

It would be nice if the wiki had a worked out example. That is, a configuration file and the recorded sound.

My problem is that I don't have a microphone array, not even a robot, my project is pure simulation. According to the math I know, sound is a wave. In one spatial dimension it looks like

cos ( -k*x + mu * k * t)

where k determines the frequency, and mu is the speed of sound. In 3D it is a little more complicated,

cos ( -k1x - k2y - k3*z + mu * sqrt(k1^2+k2^2+k3^2) * t)

(it also decays as 1/r but that is not important here)

where (k1, k2, k3) gives the direction and frequency.

I converted this to how ODAS expects it, signed 32 bit, measured at each microphone, and saved as an 8-channel raw file. I am getting some answers, but not right ones.

Any thoughts? How do you folks even validate your code? I'd think an example with perfectly simulated sound should be useful, as real world data has noise and it is not always clear how it affects the computation.

oleg-alexandrov avatar Mar 16 '19 02:03 oleg-alexandrov

The issue may be how you're measuring at each microphone, since the delays between each is paramount, and your model (at least the part of it you've explained) is not considering it.

In any case, if I may be granted a bit of self-promotion: the Acoustic Interactions for Robot Audition (AIRA) corpus may be of some use here: https://aira.iimas.unam.mx/ We use it to quickly evaluate algorithms, although I haven't tested it with ODAS.

balkce avatar Mar 16 '19 04:03 balkce

Your dataset could be very helpful. The article is pay-walled though. I would be looking for an example where reverberation is not a huge lot, since ODAS does not model that as I see it. Would you have a favorite example out of those I could try? Also, do you know of some other interesting algorithms for directional microphones that are public?

I thought my formula cos ( -k1x - k2y - k3*z + mu * sqrt(k1^2+k2^2+k3^2) * t) models everything. For a given time t, the quantities (x, y, z) are at the microphone position. The waveform at each position differs by arrival time from a waveform at a different position.

Do you think I am missing something? Thank you for your help. I know very little about robotics and acoustics, I usually do different kind of work.

oleg-alexandrov avatar Mar 16 '19 05:03 oleg-alexandrov

I will also play with the three-microphone software. Thanks again.

oleg-alexandrov avatar Mar 16 '19 05:03 oleg-alexandrov

That's odd. I'm seeing the article from home without having a pay-wall (and I'm pretty sure I paid an Open Access fee for it to be published that way). If you want, I can send you a copy of the article if you keep having trouble reading it.

There are some scenarios in AIRA that don't have any reverberation (anechoic chamber).

As for directional microphones, I'm not sure how focused you have your microphones, but some works that use cardioid microphones (which are somewhat directional) that I've seen are:

https://ieeexplore.ieee.org/abstract/document/4415116 https://link.springer.com/chapter/10.1007/978-3-319-25554-5_44 https://ieeexplore.ieee.org/abstract/document/4650760 https://ieeexplore.ieee.org/abstract/document/7415417

And as for your formula, I may be confusing things, but where does it consider the position of the source you're simulating? As for me, I tend to assume planar waves (far-field assumption), which makes life much easier. Then, I use the following diagram to calculate the time-differences-of-arrival (TDOA) for each microphone, given a source angle and a given reference microphone.

TDOA

t_{3-1} being the TDOA for microphone 3. Once that is calculated, then you can calculate the cos wave arriving at microphone at mic 3 as cos(2 * pi * k * (t-t_{3-1}) ), k being the frequency and t the time variable. Obviously, t_{1-1} is 0, since microphone 1 is the reference mic.

Hope this helps.

balkce avatar Mar 16 '19 06:03 balkce

Caleb, I realized that I can see the paywalled article at https://www.sciencedirect.com/science/article/abs/pii/S0003682X98000267 from my work so that is good. I also see your other publication at https://aira.iimas.unam.mx/aira.pdf so that is great.

In the formula I put in, I also assume the planar wave, so there is no source and no attenuation. It does not model microphone directionality though. That is the formula for the plane wave, per https://en.wikipedia.org/wiki/Plane_wave#Mathematical_representations.

I understand your geometric diagram. It does not take into account the wave frequency, just the physical principles of how waves move around. That is good enough, since actual sound can always be measured, if you have hardware.

I will study things more on Monday. I appreciate your help.

oleg-alexandrov avatar Mar 16 '19 16:03 oleg-alexandrov

This may not apply to you, but I have found this package pretty useful: https://github.com/LCAV/pyroomacoustics

gladwig2 avatar Mar 17 '19 15:03 gladwig2

I have not had a chance to study the AIRA corpus yet. I started with the pyroomacoustics package since that one is heavy on simulation, both of the room and the sounds. The latter can be specified either as a random sequence at the source or by reading from a wav file. I also tried to override their recordings at the microphones with a simple plane wave before invoking their solver. They implement MUSIC and a few others. In either case I get very nice correct answers, both with 2D and 3D direction search (though for the latter I had to arrange the microphones in a 3D configuration to get improved results).

I so much appreciate all the help!

I don't know why ODAS did not work well. I was careful to write properly RAW files and read them back using pcm_normalized2signedXXbits, etc (this function does not like it when the amplitude is either 1 or -1, I don't recall which now, it is better to keep it < 1). I played a lot with microphone array configurations, for example the one from xmos.cfg. I also tried a cube array, and two arrays of 7 microphones on top of each other. I played quite a bit with the range of angles.

I was getting sane results. For example, when changing the sign of the direction from which the wave arrives, some the outputs in the sources.txt would change sign as well. But the results are just not right overall.

ODAS really like to output solutions in sources.txt that have a large z. Here is an example:

"x": -0.066, "y": 0.021, "z": 0.998, "E": 0.406

The solution also wiggles around even though I keep on sampling the same wave but at different time. Here is an example:

{ "timeStamp": 83, "src": [ { "x": -0.066, "y": 0.021, "z": 0.998, "E": 0.431 } ] } { "timeStamp": 84, "src": [ { "x": 0.041, "y": -0.056, "z": 0.998, "E": 0.432 } ] }

I am sure I am doing something wrong. The problem is that there is just too little documentation to be able to understand how to make things work. And really, one simple analytical/simulated example would help so much, and would be a good sanity check, independent of hardware.

oleg-alexandrov avatar Mar 20 '19 01:03 oleg-alexandrov

If you are talking about the separated file or post-filtered file,you can play the raw files by audacity,file-> import-> raw data, and chose your channel number , sample rate, and little-endian,signed 16 bit PCM.(works in master branch)

Nonikka avatar May 30 '19 09:05 Nonikka