wavfix
wavfix copied to clipboard
if data chunk size is unknown, wavfix does not make good guess on the data size
first of all thanks for the great idea of this tool! :) the following is a very common scenario of wav file corruption wich is not covered by wavfix: the issue is, in case the chunk size is zero (wich often occours when the recording device looses power) as i understood, wavfix tests all the data behind the data chunk and finds weired "chunks" by checking if its a "valid" riff-chunk:
https://github.com/agfline/wavfix/blob/master/src/libriff.c#L218-L250
assuming 4 bytes in a row define a new chunk.
in case of sound-devices recorder i mostly use, in my experience there is mostly only some sox --ignore-length XXX.WAV
the output would be perfect wave file with all data but all metadata and heades are lost. thats where wavfix should come in ;)
i assume that so far wavfix can only repair wav-files that do not come up with 4 chars in the sample data, whereas in longer files the likelyhood is very big.
see my wavfix example output of broken 900MB wav file:
Processing 'XXX.WAV'
| [w] Wrong RIFF size: 0000000000 B + 8 [file size: 0210763776 B;] | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0000000000 + 4 + 4;] | [NULL] chunk [offset: 0000006144; size: 0003831957 + 0 + 0;] odd w/ pad | bvDD chunk [offset: 0003838109; size: 0167829378 + 4 + 4;] | [NULL] chunk [offset: 0171667495; size: 0039096289 + 0 + 0;] odd w/ pad | | [i] Checking data chunk.. | | Chunk size is 0 byte. That is very unlikely.. | | A block of 3831957 unknown bytes comes after data chunk. | | Assume those are audio data. Merging them with data chunk.. | | done. | | Recovered file structure : | ======================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0003831957 + 4 + 4;] odd w/ pad | bvDD chunk [offset: 0003838101; size: 0167829378 + 4 + 4;] | [NULL] chunk [offset: 0171667487; size: 0039096289 + 0 + 0;] odd w/ pad | [w] 1 unknwon bytes block remains. | Some programs like Pro Tools use wrong formated files | like this ( DIGI chunk size missing 1 byte in PT ), | so we wont correct it. Yet it should play fine anyway. | | [i] Saving repaired file to 'XXX_REPAIRED.WAV' ADDING PADDING BYTE ADDING PADDING BYTE | File successfully recovered.
wavefix doesn't accept its own generated _REPAIR + its not a valid wav file.
in this case the pattern 'AbvD' in the sample data causes wavfix to think the data chunk is over.
Processing 'XXX_REPAIRED.WAV' | [w] Wrong RIFF size: 0000000002 B + 8 [file size: 0210763786 B;] | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0003831957 + 4 + 4;] odd w/ pad | AbvD chunk [offset: 0003838101; size: 0014647876 + 4 + 4;] | [NULL] chunk [offset: 0018485985; size: 0095141419 + 0 + 0;] odd w/ pad | MZ5g chunk [offset: 0113627412; size: 0041223827 + 4 + 4;] odd w/ pad | [NULL] chunk [offset: 0154851247; size: 0016816249 + 0 + 0;] odd w/ pad | [NULL] chunk [offset: 0171667504; size: 0039096289 + 0 + 0;] odd w/ pad | [NULL] chunk [offset: 0210763801; size: 0000000001 + 0 + 0;] odd w/ pad | | [i] Checking data chunk.. failed.
thats what the output is after i manually repaired the file with a hex editor: Specification of the Broadcast Wave Format (BWF) helped me.
Processing XXX_MANUALLY_REPAIR.WAV' | Current file structure : | ====================== | bext chunk [offset: 0000000012; size: 0000000858 + 4 + 4;] | iXML chunk [offset: 0000000878; size: 0000005226 + 4 + 4;] | fmt chunk [offset: 0000006112; size: 0000000016 + 4 + 4;] | data chunk [offset: 0000006136; size: 0210757626 + 4 + 4;] | | [i] Checking data chunk.. ok. | | File ok.
i hope this problem is fixable. it would "fix" so many more broken wav files out there and make many people happy :) even though i'm not a C programmer, i'm willing to help if i can.
Hi @dtill and thanks for your message.
First of all, the chunk names are missing from the log you posted, probably because of the <
>
chars in the output... I really must change that.
You've perfectly understood how wavfix works : In the case of a data
chunk with null size, wavfix parses the following bytes looking for some other chunks or EOF, so it can guess the data
chunk size. In the case of your file, it gets fooled by those apparently valid fourCC audio samples.
You're right, lots of hardware implementations (like Sound Devices) terminate the files with the data
chunk. However, this is not mandatory in the RIFF/WAV standards, and a lot of software implementations (like Pro Tools, Avid Media Composer, Samplitude, etc.) write chunks after audio data.
Plus, we could think that when an error occurs during recording, the wav file will necessarily end with audio data because the recorder have no time to write something else after. But we don't actually know when and why the error occurs. For example, a recorder could perfectly write audio data, write some other chunks after it once it was stopped, then fail when it comes to write the data chunk size. So we can't presume it will always end with audio data when data
chunk size is zero.
That's why wavfix works that way.
That being said, wavfix was my first C program and it can greatly be improved. I have no time to work on it right now, but when I do, I'll find a way to properly fix the scenario you mentioned. I see different options for that :
- work on the audio samples entropy,
- do more tests on a potential chunk found inside audio data,
- allow the user to pass a second valid wav file, coming from the same recorder, so wavfix knows what chunk should be present and their order,
- allow the user to force wavfix with an option like
--no-chunk-after-audio
Do you see any other way to fix it ?
Thanks again for pointing that out.
thanks for coming back to the issue! appreceate! thanks also for pointing out important information is missing in my post. i added the missing chunk names by deleting the <>
-tags manually from the wavefix output