wavfix icon indicating copy to clipboard operation
wavfix copied to clipboard

if data chunk size is unknown, wavfix does not make good guess on the data size

Open dtill opened this issue 1 year ago • 2 comments

first of all thanks for the great idea of this tool! :) the following is a very common scenario of wav file corruption wich is not covered by wavfix: the issue is, in case the chunk size is zero (wich often occours when the recording device looses power) as i understood, wavfix tests all the data behind the data chunk and finds weired "chunks" by checking if its a "valid" riff-chunk:

https://github.com/agfline/wavfix/blob/master/src/libriff.c#L218-L250

assuming 4 bytes in a row define a new chunk. in case of sound-devices recorder i mostly use, in my experience there is mostly only some chunk after the chunk . but the main part of the "rest"file size is valid sample data still intact. when i try sox sox --ignore-length XXX.WAV the output would be perfect wave file with all data but all metadata and heades are lost. thats where wavfix should come in ;)

i assume that so far wavfix can only repair wav-files that do not come up with 4 chars in the sample data, whereas in longer files the likelyhood is very big.

see my wavfix example output of broken 900MB wav file:

Processing 'XXX.WAV'
| [w] Wrong RIFF size: 0000000000 B + 8 [file size: 0210763776 B;] | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0000000000 + 4 + 4;] | [NULL] chunk [offset: 0000006144; size: 0003831957 + 0 + 0;] odd w/ pad | bvDD chunk [offset: 0003838109; size: 0167829378 + 4 + 4;] | [NULL] chunk [offset: 0171667495; size: 0039096289 + 0 + 0;] odd w/ pad | | [i] Checking data chunk.. | | Chunk size is 0 byte. That is very unlikely.. | | A block of 3831957 unknown bytes comes after data chunk. | | Assume those are audio data. Merging them with data chunk.. | | done. | | Recovered file structure : | ======================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0003831957 + 4 + 4;] odd w/ pad | bvDD chunk [offset: 0003838101; size: 0167829378 + 4 + 4;] | [NULL] chunk [offset: 0171667487; size: 0039096289 + 0 + 0;] odd w/ pad | [w] 1 unknwon bytes block remains. | Some programs like Pro Tools use wrong formated files | like this ( DIGI chunk size missing 1 byte in PT ), | so we wont correct it. Yet it should play fine anyway. | | [i] Saving repaired file to 'XXX_REPAIRED.WAV' ADDING PADDING BYTE ADDING PADDING BYTE | File successfully recovered.

wavefix doesn't accept its own generated _REPAIR + its not a valid wav file.

in this case the pattern 'AbvD' in the sample data causes wavfix to think the data chunk is over.

Processing 'XXX_REPAIRED.WAV' | [w] Wrong RIFF size: 0000000002 B + 8 [file size: 0210763786 B;] | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0003831957 + 4 + 4;] odd w/ pad | AbvD chunk [offset: 0003838101; size: 0014647876 + 4 + 4;] | [NULL] chunk [offset: 0018485985; size: 0095141419 + 0 + 0;] odd w/ pad | MZ5g chunk [offset: 0113627412; size: 0041223827 + 4 + 4;] odd w/ pad | [NULL] chunk [offset: 0154851247; size: 0016816249 + 0 + 0;] odd w/ pad | [NULL] chunk [offset: 0171667504; size: 0039096289 + 0 + 0;] odd w/ pad | [NULL] chunk [offset: 0210763801; size: 0000000001 + 0 + 0;] odd w/ pad | | [i] Checking data chunk.. failed.

thats what the output is after i manually repaired the file with a hex editor: Specification of the Broadcast Wave Format (BWF) helped me.

Processing XXX_MANUALLY_REPAIR.WAV' | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0210757626 + 4 + 4;] | | [i] Checking data chunk.. ok. | | File ok.

i hope this problem is fixable. it would "fix" so many more broken wav files out there and make many people happy :) even though i'm not a C programmer, i'm willing to help if i can.

dtill avatar Aug 13 '23 21:08 dtill

Hi @dtill and thanks for your message.

First of all, the chunk names are missing from the log you posted, probably because of the < > chars in the output... I really must change that.

You've perfectly understood how wavfix works : In the case of a data chunk with null size, wavfix parses the following bytes looking for some other chunks or EOF, so it can guess the data chunk size. In the case of your file, it gets fooled by those apparently valid fourCC audio samples.

You're right, lots of hardware implementations (like Sound Devices) terminate the files with the data chunk. However, this is not mandatory in the RIFF/WAV standards, and a lot of software implementations (like Pro Tools, Avid Media Composer, Samplitude, etc.) write chunks after audio data.

Plus, we could think that when an error occurs during recording, the wav file will necessarily end with audio data because the recorder have no time to write something else after. But we don't actually know when and why the error occurs. For example, a recorder could perfectly write audio data, write some other chunks after it once it was stopped, then fail when it comes to write the data chunk size. So we can't presume it will always end with audio data when data chunk size is zero.

That's why wavfix works that way.

That being said, wavfix was my first C program and it can greatly be improved. I have no time to work on it right now, but when I do, I'll find a way to properly fix the scenario you mentioned. I see different options for that :

  • work on the audio samples entropy,
  • do more tests on a potential chunk found inside audio data,
  • allow the user to pass a second valid wav file, coming from the same recorder, so wavfix knows what chunk should be present and their order,
  • allow the user to force wavfix with an option like --no-chunk-after-audio

Do you see any other way to fix it ?

Thanks again for pointing that out.

agfline avatar Aug 27 '23 12:08 agfline

thanks for coming back to the issue! appreceate! thanks also for pointing out important information is missing in my post. i added the missing chunk names by deleting the <>-tags manually from the wavefix output

dtill avatar Aug 28 '23 16:08 dtill