flac icon indicating copy to clipboard operation
flac copied to clipboard

flac.exe on 4 GiB PCM: truncates without warning when channelcount*bytepersample is 1, 3, 5 or 15

Open H2Swine opened this issue 3 years ago • 12 comments

Expected when trying to flac a malformed .wav with > 2^32-1 bytes of PCM: a rejection, unless --ignore-chunk-sizes is called Typical flac.exe behaviour: the 2^32-1 limit encountered in a partial sample and therefore that error (but, file written - is that desirable?) But: silent truncation (lossy) if: 2^32-1 is an integer number of samples per channel, like in the attached example file.

wavefile-and-flacfiles-doublecompressed.zip

H2Swine avatar Mar 05 '22 08:03 H2Swine

I've been pondering how to fix this, but I think there is no clean solution possible.

The extra data that follows could belong to the data chunk, but it could also be the start of a new chunk. For this specific file, it is clear it isn't (it is too short to be a new chunk) but a more general approach isn't possible.

Perhaps checking the length of the master RIFF chunk would help here?

ktmf01 avatar Aug 29 '22 10:08 ktmf01

Not sure if I understood correctly ... pretty sure I didn't, but anyway:

  • Either the file declares audio to be 2^32-1 bytes long - of maximum length - and then it isn't malformed. At least not this kind of malformed, and we are not in the situation I reported at all?
  • Or if it declares something else, and then FLAC (without --ignore-chunk-sizes) would object because audio doesn't match declaration?

Was that ... off mark?

H2Swine avatar Sep 01 '22 09:09 H2Swine

  • Or if it declares something else, and then FLAC (without --ignore-chunk-sizes) would object because audio doesn't match declaration?

The problem is that there is no way for FLAC to determine whether the audio doesn't match the declaration, except when there is a partial sample. At first I tried to fix this by simply checking whether there was any left-over data after the data chunk (the audio) but it is very well possible there is another chunk (with metadata) that follows the data chunk.

So, the only way to determine where the audio starts and stops is with the chunk length. --ignore-chunk-sizes is a hack (it does warn to use with caution after all) that assumes there is no other chunk after the data chunk, but this is not necessarily the case.

FLAC should warn that there is data trailing the data chunk (which could be audio or another chunk) and it doesn't do that indeed. That approach would result in a warning, not an error.

ktmf01 avatar Sep 01 '22 13:09 ktmf01

The issue was not about having flac --ignore-chunk-sizes tell audio from non-audio, it was about behaviour in the absence of --ignore-chunk-sizes . If I understand the following correctly, it hits spot on to what I tried to raise this issue over:

FLAC should warn that there is data trailing the data chunk (which could be audio or another chunk) and it doesn't do that indeed. That approach would result in a warning, not an error.

  • That means: FLAC should report back (as "warning" or "error", that is a policy question) that the specified chunk sizes are wrong, and that the user must consider --ignore-chunk-sizes with the risks that incurs? Yes, that is my point.
  • But, observed behaviour: FLAC does not warn, it happily truncates.

And that isn't good, I think.

H2Swine avatar Sep 02 '22 11:09 H2Swine

  • That means: FLAC should report back (as "warning" or "error", that is a policy question) that the specified chunk sizes are wrong, and that the user must consider --ignore-chunk-sizes with the risks that incurs? Yes, that is my point.

But the chunk sizes don't have to be wrong for this trailing data to be present. So, FLAC cannot discern that chunk sizes are wrong. It can only warn that there is data trailing the data chunk.

ktmf01 avatar Sep 02 '22 11:09 ktmf01

FLAC should be able to realize that this "WAVE file" is too large to be a WAVE file?

H2Swine avatar Sep 02 '22 15:09 H2Swine

Yes, in this instance that is possible, but that is not a general solution. Even so, I won't fix this before the release of 1.4.0, this need thorough testing and I really want to get a release with fuzzing fixes out.

ktmf01 avatar Sep 02 '22 18:09 ktmf01

No, there is no general solution for flac --ignore-chunk-sizes to be able to tell audio from non-audio in trailing data. But that is not what I am reporting.

This is about how FLAC without "--ignore-chunk-sizes" still ignores an obviously wrong chunk size, and thereby - without warning - drops samples (and thus, drops losslessness).

H2Swine avatar Sep 02 '22 19:09 H2Swine

I meant that just looking at the filesize wouldn't be of any use for a smaller file with a wrong chunk size.

ktmf01 avatar Sep 02 '22 20:09 ktmf01

You still did not migrate to w64? Oh.

ValZapod avatar Sep 04 '22 00:09 ValZapod

@ValZapod Are you sure you didn't post this at the wrong issue?

ktmf01 avatar Sep 04 '22 17:09 ktmf01

@H2Swine I've published a fix in #446. There is another problem though: using --keep-foreign-metadata on this file makes FLAC use a lot of memory until it is killed by my OS. It seems it tries to buffer the complete file in search of metadata past the data chunk. The original issue should be fixed with #446. Let me know if you need a compile to test.

ktmf01 avatar Sep 13 '22 17:09 ktmf01