[discussion] what to do with diversity of input formats?
... to be continued...
use the sox tool in the VM to convert everything to WAV? use other scripts in tools/ or toolbox/ or varia/ to convert to RTTM? Just joining the discussion :)
One question we had is that we know that some of the tools (looking at you noisemes) have different results wether the input has 1 or 2 chanels. Wouldn't it be important to benchmark all the tools (except tocombosad which doesn't work on 2 chanels) both on 1 and 2 chanels wav (we can use the CAS dataset which has wavs in stereo), so that we know on which we get the better results. If it turns out for example that mono wavs systematically give better results, we can force the conversion at the call of the tool (or even before). Maybe some of the tools also force the conversion internally...
The test could be done on several parameters of the input audio:
- number of chanels (mono vs stereo - or more..)
- sample rate ( 16k vs 44k )
- encoding (mp3 vs wav vs other ...)
To clarify, here's the todo list:
- [ ] Check each tool: is it forcing a conversion unbeknownst to us? If so, what is the preferred input?
- [ ] Check each tool: does it give different results for the 3 variables Julien mentioned (# channels, sample rate, encoding)?
- [ ] If answer to the above is: no conversion, no difference, add layer that converts all input to the preferred input for a given tool
This looks like a good task for the CMU student team, given the existing task to survey all the tools for other parameters: processing time, limits on input recording duration, memory consumption. That survey task may not be in the form of a GitHub Issue yet, was just something we agreed to work on by email.
sounds good! However, I think this task is not a priority. It's just something we want to bear in mind in the future (ie an improvement) since our current user base uses standardized sound formats. So let's hold off assigning anyone until we have completed the other project.