Add 'fact' chunk for WAVE_FORMAT_EXTENSIBLE
Draft PR as this is a work in progress (lots of assumptions are made) and tests are failing.
Started working on this as a solution to #141. The linked page suggests There is a question as to whether the fact chunk should be used for (including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One example of a WAVE_FORMAT_EXTENSIBLE with PCM data from Microsoft, does not have a fact chunk., but libsndfile will complain of a missing fact chunk if there is no fact chunk present in a WAVE_FORMAT_EXTENSIBLE file, so I'm not sure how necessary the field actually is, especially considering the data for the chunk is redundant (and we aren't using it in any meaningful way).
This is the minimum work required to add the fact chunk for WAVE_FORMAT_EXTENSIBLE files.
This is the output of sndfile-info on a decoded WAV file with 24 bps tested with this PR:
File : test.wav
Length : 41539280
RIFF : 41539272
WAVE
fmt : 40
Format : 0xFFFE => WAVE_FORMAT_EXTENSIBLE
Channels : 2
Sample Rate : 48000
Block Align : 6
Bit Width : 24
Bytes/sec : 288000
Valid Bits : 24
Channel Mask : 0x3 (L, R)
Subformat
esf_field1 : 0x1
esf_field2 : 0x0
esf_field3 : 0x10
esf_field4 : 0x80 0x0 0x0 0xAA 0x0 0x38 0x9B 0x71
format : pcm
fact : 4
frames : 6923200
data : 41539200
End
----------------------------------------
Sample Rate : 48000
Frames : 6923200
Channels : 2
Format : 0x00130003
Sections : 1
Seekable : TRUE
Duration : 00:02:24.233
Signal Max : 8.36785e+06 (-0.02 dB)
@ktmf01 What are your thoughts on this? I almost see this as bloat as it is redundant data - but it is supposedly part of the spec, and libsndfile in particular will tell you the chunk is missing. This alone probably won't fix the decoding issue experienced in #139, but supposedly it's the "right" thing to do.
In a seperate PR I will add a --force-legacy-wav-format option that will mimic the behavior before bb750734287a4079ca3de9ff85c71cc62160ac46, as that would bear some use regardless of the outcome of this PR.
WAVE update (Revision: 3.0), 1994-04-15: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/RIFFNEW.pdf says
The fact chunk is required for all new WAVE formats. The chunk is not required for the standard WAVE_FORMAT_PCM files.
However, WAVE_FORMAT_EXTENSIBLE is not yet defined in this spec.
Multi-channel / high bit resolution formats, 2001-12-04: http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/multichaudP.pdf Doesn't mention the fact chunk once. It however clearly states multiple times that WAVE_FORMAT_EXTENSIBLE is meant to extend WAVE_FORMAT_PCM, for example
Microsoft has extended formats that are traditionally mono or stereo and 8-bit or 16-bit. Basing WAVE_FORMAT_PCM and WAVE_FORMAT_IEEE_FLOAT on a structure named WAVEFORMATEXTENSIBLE does this.
Also the document on WAVE_FORMAT_EXTENSIBLE only lists PCM formats. The whole document is about the ambigouity of PCM with more than 16 bits per sample and more than 2 channels in WAVE_FORMAT_PCM.
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html states, right below the extensible format table at the very bottom of the document
The fact chunk can normally be omitted if the sampled data is in PCM format.
So looking at the Microsoft documents, I can't see why the fact chunk should be mandatory. It isn't forbidden either though. Finally, latest ffmpeg (compiled fresh from git) also creates WAVE_FORMAT_EXTENSIBLE without a fact chunk.
What have we got against? sndfile-info says
**** All non-PCM format files should have a 'fact' chunk.
Maybe we should file an issue with libsndfile where this message comes from?
I just noticed this bit in the referenced document:
There is a question as to whether the fact chunk should be used for (including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One example of a WAVE_FORMAT_EXTENSIBLE with PCM data from Microsoft, does not have a fact chunk.
As you can see I've referenced this pull request in an issue with libsndfile
edit: Only now I see you already found this bit and referenced it in the first post. Sorry, I missed that.
I just noticed this bit in the referenced document:
There is a question as to whether the fact chunk should be used for (including those with PCM) WAVE_FORMAT_EXTENSIBLE files. One example of a WAVE_FORMAT_EXTENSIBLE with PCM data from Microsoft, does not have a fact chunk.
As you can see I've referenced this pull request in an issue with libsndfile
edit: Only now I see you already found this bit and referenced it in the first post. Sorry, I missed that.
Thank you for your input. Fwiw, I am generally opposed to this as it is redundant information with no confirmed benefit other than to silence libsndfile. I threw it up as a minimally functional patch in response to the issue.
Any media player that was affected by the change away from WAVE_FORMAT_PCM likely wouldn't be solved by this, but rather need to be updated/changed to properly detect the format. If requested, I'm more than happy to provide some test files (alternatively, you can compile this branch and make your own test files. Do note that tests are still failing for now).
I'll leave it up for now for discussion. If @erikd (or anyone else, for that matter) can provide an actual use-case or reasoning to continue forward with this patch set, then I will finish this.
@ktmf01 I did some more searching around and it looks like libsox actually does write a fact chunk for non-PCM formats, and will do so automatically for WAVE_FORMAT_EXTENSIBLE. The blame shows that the code was written 16 years ago. I can't imagine any resource explicitly relies on having the fact chunk present, but that's just some food for thought.
@ktmf01 I'm going to go ahead and close this. The data is redundant and there's no clear specification that strictly requires this.
Thanks for putting in the effort, sadly it didn't work out
Thanks for putting in the effort, sadly it didn't work out
It wasn't that much effort at all. I'm a big fan of flac and took it as a little bit of a learning experience. It purposefully wasn't a complete PR because I didn't know if it was something we even wanted. I do intend to contribute in the future, for sure ☺️