python-soundfile icon indicating copy to clipboard operation
python-soundfile copied to clipboard

Soundfile sometimes reading no data despite there being more data available

Open lugia19 opened this issue 2 years ago • 26 comments
trafficstars

This is a pretty weird usecase so I wasn't able to shrink the issue down to a small sample, sorry for that.

Essentially my code does the following:

  • Creates a BytesIO object
  • Starts a thread that downloads an mp3 file in chunks using requests. This thread will set an event whenever there is data still available to be read. The download thread moves the head back to the position it was before writing, so the file is written correctly.
  • The main thread waits for the "data available" event and uses soundfile to read the newly written data.

The issue that happens is that sometimes soundfile will not read any data, despite there being more already written. This can be solved by recreating the soundfile object, but it causes some weird behavior that leads to skipping in the playback.

Here is the code with the playback part removed, the issue happens in the sfReadAndRemake function:

(removed as there is a better version later in the thread)

And here's the full version, playback included: https://gist.github.com/lugia19/c6fc9de4a8da30d5430ec468c18551a3

lugia19 avatar Mar 26 '23 00:03 lugia19

Soundfile accesses the BytesIO object using its read, write, seek, and tell function. Try wrapping the BytesIO in your own class that logs operations to see what it is doing exactly. Perhaps that will give you insight into this problem.

bastibe avatar Apr 01 '23 07:04 bastibe

I tried, but I couldn't really spot anything that was obviously wrong. BytesIO correctly reports the number of bytes left in the file, but soundFile still reads no data.

One thing I did notice is that when the soundfile is destroyed and remade, the number of frames (soundFile.frames) changes. I tried setting the value of frames of the old soundFile object to the value of the new one, but that still results in it reading no data.

lugia19 avatar Apr 12 '23 19:04 lugia19

glancing over your code briefly, you have this line newSF.seek(bytesSoundFile.tell()), which uses the bytes-offset (tell) as a frame number. That can't be right? Before you read, you should seek to 0.

bastibe avatar Apr 13 '23 07:04 bastibe

Ah, my mistake, I thought tell would give me the current frame number rather than the bytes-offset. How can I get the current frame number instead?

I do this because newSF is the newly created soundFile, so I'm trying to seek it to the old one's position to avoid the playback starting all over again.

lugia19 avatar Apr 13 '23 09:04 lugia19

tell on a BytesIO gives you a bytes offset. Tell on a SoundFile gives you a frame offset.

(forgive me if I misread your source code, my time to spend on this is limited)

bastibe avatar Apr 13 '23 09:04 bastibe

Not a problem, I totally understand. Then yeah, the code is correct since it's using the tell from the previous SoundFile.

I'll try to get the code sample down to a more manageable example again, so it's not too long.

lugia19 avatar Apr 13 '23 09:04 lugia19

While making the smaller code sample, I noticed a couple things.

  1. Whether or not the bug occurs (and where in the file) is entirely random (my favorite). Sometimes it just doesn't happen, it may also depend on the download speed.
  2. It can happen either when the file is at the end, or when it isn't. The error that is thrown is an AssertionError, due to a mismatch in the number of frames actually read vs expected. When the file is already at the end, the expected value is a "sane", number, with the specific file I'm using it's around 800 something frames. When it's not at the end, it's something like 21k, which is almost every frame (but not quite).

Here's the cut down code sample, the bug occurs on line 119. When an AssertionError is first thrown, the code tries to re-run buffer_read without specifying a number of frames, to account for cases where there isn't enough data left in the file. If that fails again, then it must be the bug.

Hopefully this example is clearer than the previous one, this was about as concise as I could get it.

downloadThreadFunction is merely the function that handles downloading the file and writing it to the BytesIO object, setting a couple of events to make sure data is only read when it's available.

main is where the actual reading happens. In the main While loop, it waits for the blockDataAvailable to be set by the download thread, then tries reading.

from __future__ import annotations

import io
import logging
import os
import threading
import requests
import soundfile as sf

#Set to logging.DEBUG to log all bytesIO operations
logging.basicConfig(level=logging.ERROR)

playbackBlockSize = 2048
downloadChunkSize = 4096
bytesSoundFile: sf.SoundFile

files:dict = {
    "ffmpeg": {
        "url":"https://files.catbox.moe/nygh96.mp3",
        "localfile":r"C:\Users\lugia19\Desktop 2\audioFiles\ffmpegSampleMono.mp3"
    },
    "11": {
        "url":"https://files.catbox.moe/hpemr0.mp3",
        "localfile":r"C:\Users\lugia19\Desktop 2\audioFiles\11sample.mp3"
    }
}

#Just a wrapper class that logs all operations.
class BytesIOWrapper(io.BytesIO):
    def read(self, __size: int | None = None) -> bytes:
        logging.debug(f"Reading {__size} bytes")
        return super().read(__size)

    def write(self, __buffer) -> int:
        logging.debug(f"Writing {len(__buffer)} bytes")
        return super().write(__buffer)

    def seek(self, __offset: int, __whence: int = 0) -> int:
        logging.debug(f"Seeking by {__offset} from {__whence}")
        newPos = super().seek(__offset, __whence)
        logging.debug(f"Seeked to {newPos}")
        return newPos

    def tell(self) -> int:
        curPos = super().tell()
        logging.debug(f"Telling current position: {curPos}")
        return curPos

bytesFile = BytesIOWrapper()

bytesLock = threading.Lock()  # Lock for the bytes file

# Sent from the download thread when the download has completed.
downloadDoneEvent = threading.Event()
# Sent from the download thread when there is enough new data for the playback thread to read a block.
blockDataAvailable = threading.Event()
soundFileCreated = threading.Event()

def downloadThreadFunction(URL):
    streamedRequest = requests.get(URL, stream=True)
    streamedRequest.raise_for_status()
    totalLength = 0
    print("Starting iter...")
    for chunk in streamedRequest.iter_content(chunk_size=downloadChunkSize):
        totalLength += len(chunk)
        if len(chunk) != downloadChunkSize:
            print("Writing weirdly sized chunk (" + str(len(chunk))+")...")

        # Write the new data then seek back to the initial position.
        with bytesLock:
            global bytesFile
            lastReadPos = bytesFile.tell()
            bytesFile.seek(0, os.SEEK_END)
            bytesFile.write(chunk)
            endPos = bytesFile.tell()
            bytesFile.seek(0)
            bytesFile.seek(lastReadPos)
            if endPos-lastReadPos > playbackBlockSize:
                if not blockDataAvailable.is_set():
                    print(f"Raise available data event: {endPos-lastReadPos} bytes available")
                blockDataAvailable.set()
        if not soundFileCreated.is_set():
            soundFileCreated.wait() #We've written a chunk, which should be enough to create the soundfile. Wait for the other thread.
    print(f"Download finished - {totalLength} bytes")
    with bytesLock:
        lastReadPos = bytesFile.tell()
        bytesFile.seek(0)
        fp = open("downloadedFile.mp3","wb+")
        fp.write(bytesFile.read())
        fp.flush()
        fp.close()
        bytesFile.seek(lastReadPos)
    downloadDoneEvent.set()
    blockDataAvailable.set()  # Ensure that the other thread knows data is available
    return

def checkRemainingLength():
    currentPos = bytesFile.tell()
    endPos = bytesFile.seek(0, os.SEEK_END)
    bytesFile.seek(currentPos)
    remainingBytes = endPos - currentPos
    print(f"{remainingBytes} < {playbackBlockSize} and {not downloadDoneEvent.is_set()}")
    if remainingBytes < playbackBlockSize and not downloadDoneEvent.is_set():
        print("Marking no available blocks...")
        # Download isn't over and we've consumed enough data to where there isn't another block available.
        blockDataAvailable.clear()
    return remainingBytes


def main():
    fileData = files["11"]
    localFile = fileData["localfile"]
    testsf = sf.SoundFile(open(localFile, "rb"))
    print(f"File loaded from disk. \nSeekable:{testsf.seekable()} \nNumber of frames:{testsf.frames}")

    #This starts a thread which will download the file chunk by chunk.
    downloadThread = threading.Thread(target=downloadThreadFunction, args=(fileData["url"],))
    downloadThread.start()

    #soundFileCreated.set()
    #downloadThread.join()
    #We wait for the first block of data to be available, then we create the soundFile.
    #If the creation fails, we wait for the next one and try again.
    while True:
        try:
            blockDataAvailable.wait()
            with bytesLock:
                global bytesSoundFile
                bytesSoundFile = sf.SoundFile(bytesFile, mode="r")
                print(f"File loaded from download with {bytesFile.tell()} bytes read. \nSeekable: {bytesSoundFile.seekable()} \nNumber of frames: {bytesSoundFile.frames}")
                checkRemainingLength()
                soundFileCreated.set()
                break
        except:
            blockDataAvailable.clear()


    print("Starting playback...")
    while True:

        blockDataAvailable.wait()  # Wait until a block of data is available.
        with bytesLock:
            try:
                print(f"Current position: \nIn bytes:{bytesFile.tell()} \nIn frames:{bytesSoundFile.tell()}")
                data = bytesSoundFile.buffer_read(playbackBlockSize, dtype="float32")
            except AssertionError as e:
                print("Got back an AssertionError. This could be due to not enough data left. Let's read what is available.")
                print(f"Position after error:{bytesFile.tell()}")
                try:
                    curPos = bytesFile.tell()
                    data = bytesSoundFile.buffer_read(dtype="float32")
                except AssertionError as en:
                    print("Still got a mismatch.")
                    newPos = bytesFile.tell()
                    endPos = bytesFile.seek(0, os.SEEK_END)
                    bytesFile.seek(curPos)
                    print(f"Were we at the end before the read? {curPos==endPos}")
                    print(f"Are we at the end afterwards? {newPos == endPos}")
                    if endPos == curPos and downloadDoneEvent.is_set():
                        print("We're done. Exit.")
                        break
                    else:
                        print("We're not at the end of the file.")
                        print("Let's try recreating the file and seeing if that gives us a different result.")
                        bytesFile.seek(0)
                        newSF = sf.SoundFile(bytesFile, mode="r")
                        newSF.seek(bytesSoundFile.tell())
                        bytesSoundFile = newSF
                        readData = bytesSoundFile.buffer_read(playbackBlockSize, dtype="float32")
                        print(f"New soundFile read {len(readData)} bytes.")
                        #Exit.
                        input("The bug happened. Press Enter to exit.")
                        break

            print(f"Read {len(data)} bytes from soundFile.")
            if len(data) == 0:
                print("Got back no data.")
            if checkRemainingLength() == 0 and downloadDoneEvent.is_set():
                print("Done! Exiting.")
                return

    print("While loop done.")


if __name__ == "__main__":
    main()```

lugia19 avatar Apr 14 '23 11:04 lugia19

The assertion error happens if sf_read_float returns fewer frames than requested. Normally, this is taken care of by Soundfile._check_frames, which makes sure that there are never frames requested beyond the end of the file. However, this correction is only possible for files that support seeking. Perhaps the issue might be that some of your files do not support seeking? Can you confirm that the error only happens when the end of the file is reached?

If that's the case, there are two possible fixes:

  • we can simply remove the assert in buffer_read, which will then always return the requested number of frames, but may fill unavailable frames with zeros.
  • we can truncate the buffer returned by buffer_read to only contain the available number of frames.

(the latter requires removing the assertion, and changing the return line of buffer_read to be modified to return _ffi.buffer(cdata, read_frames * self.channels * _ffi.sizeof(ctype)))

bastibe avatar Apr 15 '23 08:04 bastibe

Okay, I did some more checking and logging. Of note, the file is seekable, according to SoundFile.seekable()

In short, there are two cases where the bug happens (when it does happen, as it's not fully consistent). I'll describe the behavior of each case.

When the file isn't over

buffer_read is requested to read 2048 frames. _check_frames does not reduce this number, so it must be reading that there are at least 2048 frames left to read, if I understand this correctly. The read operation however only returns 1152 frames, leading to the AssertionError.

After this, my code tries a buffer_read without specifying the number of frames to read. _check_frames detects that there are 21533 frames left to read, but the read operation returns 0 frames.

When the file is over

Something similar happens. The first buffer_read returns 1408 frames instead of 2048, which is what _check_frames expected, and the second one returns 0 instead of 864, which is once again what _check_frames expected. This case is honestly not as a big a problem as I can simply check for whether or not the file is over, and handle it accordingly.

Something else I've noticed during testing is that the number of frames (SoundFile.frames) is incorrect compared to the full file, and only updates when the SoundFile is recreated (which also happens to fix the issue). Might be worthy of note.

lugia19 avatar Apr 16 '23 08:04 lugia19

That's regrettable. So apparently, libsndfile gives inconsistent numbers of frames between the info struct created by sf_open (exposed in soundcard as SoundFile.frames) and actually reading the file. I think you should take this issue to libsndfile directly, as it does not seem related to soundfile.

Does this happen for all file types, or only some files? It is conceivable that this might be caused by a corrupt file header that gives an incorrect number of frames.

If this is indeed erroneous behavior, soundfile's current exception is probably reasonable. Otherwise, it might be better to return whatever fractional data we get from libsndfile, and at most issue a warning of a potentially corrupted file header.

I am currently leaning towards the latter, as libsndfile returns first partial data, then no data, which is consistent with a regular file end, and might merely indicate a corrupted header or a non-critical inaccuracy in libsndfile.

It is also worth noting that this is the very first time I have ever heard of this issue. Being that soundfile has millions of active users, this might indicate a local problem on your end (i.e. a corrupted file header), rather than a bug in libsndfile.

bastibe avatar Apr 16 '23 13:04 bastibe

I think it's definitely an issue with the fact that the SoundFile is created without having the entire file written to disk, which might be why no one has encountered it before.

I have done the following:

  • Tested with a wav file instead of an mp3 The initial number of frames is incorrect, and it simply stops reading past the initial (incorrect) number of frames, this time without throwing an AssertionError as _check_frames reports there are 0 frames as well. Of course this could probably be fixed with read/write mode for the wav, but that's unavailable for mp3 files.

  • Tested with different mp3 files It seems like there IS an issue with the header of the original file as using a different mp3 file generated via ffmpeg the initial number of frames is correct. HOWEVER, the bug is still present, and now recreating the SoundFile no longer fixes it, as it continues reading zero frames (despite the file not being over).

I can upload the files used for testing if that might be helpful. Also, I had to make some small adjustements to avoid thread starvation, which have been added to the code I posted earlier.

In case this is important, the way I'm checking the "correct" number of frames is by simply downloading the file and creating a SoundFile.

lugia19 avatar Apr 16 '23 14:04 lugia19

I see. I think we should replace the assertion with a warning then, but always return whatever libsndfile reads. That seems like a more robust process than trusting .frames.

bastibe avatar Apr 17 '23 07:04 bastibe

That's probably a good idea.

As for the bug where the read number of frames is zero (despite _check_frames reporting that there should be more frames to read), I suppose I should take that to libsndfile instead?

lugia19 avatar Apr 17 '23 15:04 lugia19

I think that's most likely a header corruption issue. It might still be worth pursuing with libsndfile to get clarity, though.

Would you like to contribute the change as a pull request? No problem if not.

bastibe avatar Apr 18 '23 07:04 bastibe

Sure, I'll make a pull request for the change later.

As for the bug, I have done more testing and (after fixing a mistake I made when making the cut down code sample which was making debugging needlessly complicated) I've narrowed it down to definitely something being wrong with the file itself, as at one point it simply stops returning any data (without any AssertionErrors, it just doesn't return anything)

It wasn't easy as the other files I was trying were giving me a "unspecified internal error", but after looking at the files more closely I realized that they were stereo, unlike the original file which was mono. Converting them to mono fixed the issue, so now the "known good" files work without issues.

I BELIEVE the problem lies in the fact that the file has no specified length in its header, as using ffprobe gives me Estimating duration from bitrate, this may be inaccurate which does not happen with the other files.

I modified the code to load the file from disk and from the download, and to print the detected number of frames for both.

For the ffmpeg file (which works without issues) it correctly reads the number of frames both times:

File loaded from disk. 
Seekable:True 
Number of frames:757760

File loaded from download with 522 bytes read.
Seekable: True 
Number of frames: 757760

Meanwhile for the other file it's different:

File loaded from disk. 
Seekable:True 
Number of frames:116894

File loaded from download with 208 bytes read. 
Seekable: True 
Number of frames: 22685

In short (albeit this is a guess) I think this is the source of the problem, as it starts returning no data exactly at the point where the initial estimated length ends (22685 frames in).

This is also why re-creating the SoundFile fixes it, as it re-estimates the length (and usually, by that time the file is fully downloaded so it will be correct)

lugia19 avatar Apr 18 '23 13:04 lugia19

Very interesting! This means that relying on .frames is therefore actually incorrect, as .frames can be an estimate. This should probably be added to the documentation as well.

Now that you mention it, I remember seeing this in *.au audio files, where the number of frames is stored as a 32 bit integer, but -1 is explicitly defined for unknown, enabling >4GB files (at which point it falls back on the file's size). So apparently the same logic is possible for compressed files as well. Fascinating.

Thank you for the analysis!

bastibe avatar Apr 19 '23 06:04 bastibe

Now that I've actually narrowed it down to the real bug, here's a minimal code sample showcasing the issue. It'll happen with either a .wav file or a .mp3 without length information in the header.

Do you think this is something that could be addressed in soundfile or would it require a change in libsndfile?

import io
import os
import soundfile as sf

#Initialize a soundFile from disk normally
sourceFile = r"C:\Users\lugia19\Desktop 2\audioFiles\11sample.mp3"    #Either a .wav or a .mp3 without length in the header
fullSoundFile = sf.SoundFile(sourceFile,"rb")
originalFile = open(sourceFile,"rb")

partialFile = io.BytesIO()
partialFile.write(originalFile.read(4096)) #Write 2048 bytes to the temporary file
partialFile.seek(0)    #Seek it back so the SoundFile can be created properly
partialSoundFile = sf.SoundFile(partialFile)    #Create the partial soundfile
print(f"The full number of frames is {fullSoundFile.frames} and the partial estimate is {partialSoundFile.frames}")
lastReadPos = partialFile.tell()    #Make sure the head is positioned correctly
partialFile.write(originalFile.read())  #Write out the rest of the audio file
partialFile.seek(lastReadPos)

while True:
    readData = partialSoundFile.read(128)   #Read until we don't get any data back
    if len(readData) == 0:
        curPos = partialFile.tell()
        endPos = partialFile.seek(0,os.SEEK_END)
        partialFile.seek(curPos)
        print(f"Cannot read anymore data! We're currently at frame {partialSoundFile.tell()} out of {partialSoundFile.frames}, but we know the total should be {fullSoundFile.frames}")
        if partialSoundFile.tell() != fullSoundFile.frames: #If our position doesn't match what we KNOW is the actual end of the file
            print("We're not at the end yet we can't read any more data.")
            input("Press enter to exit...")
            exit()

lugia19 avatar Apr 20 '23 22:04 lugia19

If I understand this correctly, there's a SoundFile issue in dealing with inaccurate headers. It's not exactly a bug, since headers are supposed to be accurate, but we're not handling the issue as gracefully as libsndfile. As outlined above, we can handle it better by replacing the assert with a warning, and truncating the output where necessary.

Separate from that, you seem to describe some issue with read/write positions not being accurate? I think this is just a symptom of the above, since SoundFile stops reading before the end of the file is reached, due to the inaccurate header. Your method of using the BytesIO tell is fascinating, actually! It took me a bit to figure out how it even works! But I'm not sure it's actually a meaningful measure of an incomplete read, as there might be non-audio chunks at the end of a file that SoundFile rightfully ignores. WAV in particular likes to put metadata (and all sorts of wonderful nonsense!) chunks at the back.

bastibe avatar Apr 21 '23 06:04 bastibe

As outlined above, we can handle it better by replacing the assert with a warning

The assert issue is mostly secondary - the issue I'm running into now is the zero-byte reads, which do not cause an AssertionError at all. I'll try and explain the issue properly.

The first part of the code, before the While loop, (with the soundFile creation and seek/tell operations) is just meant to set up the partialSoundFile. In short, the code:

  • Creates fullSoundFile by simply loading the entire audio from disk. This is our "reference" so to speak.
  • Writes 4096 bytes of the audio data to a BytesIO
  • Creates the partialSoundFile with that amount of data (which leads to the amount of frames being incorrect, as can be seen by comparing it to fullSoundFile)
  • Writes the rest of the audio data to the BytesIO

What happens during the While loop is the following:

  • partialSoundFile reads data correctly
  • Once it reaches the amount of frames it estimated being in the file earlier, it stops returning any data
  • We know that this isn't the end of the audiodata, as the fullSoundFile reports there being more frames after this.

In short, the problem is that it stops reading at the initial estimated amount of frames, despite the fact that more audio data has since been added to the file. (I updated the code sample to reflect this using the number of frames instead of the bytesIO position)

Separate from that, you seem to describe some issue with read/write positions not being accurate?

Apologies, I probably didn't word that correctly, I was just trying to explain what the lines of code do. The positions are not inaccurate per se, it's just that writing to the BytesIO directly moves the read/write head, and can cause issues with SoundFile, so I'm making sure that doesn't happen by seeking it back (similar to this issue https://github.com/bastibe/python-soundfile/issues/333)

lugia19 avatar Apr 21 '23 09:04 lugia19

The way I've worked around for now is by checking if the data I recieve is the expected length, and if it isn't, I recreate the SoundFile. By checking if the data is less, not just zero, I was able to eliminate the audio cutting out.

Here's the snippet of code:

frameSize = bytesSoundFile.channels*sf._ffi.sizeof(bytesSoundFile._check_dtype(defaultdtype))

def sfReadAndRemake(framesToRead: int = -1, dtype=defaultdtype):
    global bytesSoundFile
    readData = bytesSoundFile.buffer_read(framesToRead, dtype=dtype)
    print(f"Expected {framesToRead * frameSize} bytes, got back {len(readData)}")
    if len(readData) < framesToRead*frameSize:
        print("Insufficient data read.")
        curPos = bytesFile.tell()
        endPos = bytesFile.seek(0, os.SEEK_END)
        if curPos != endPos:
            print("We're not at the end of the file. Check if we're out of frames.")
            print("Recreating soundfile...")
            bytesFile.seek(0)
            newSF = sf.SoundFile(bytesFile, mode="r")
            newSF.seek(bytesSoundFile.tell() - int(len(readData)/frameSize))
            if newSF.frames > bytesSoundFile.frames:
                print("Frame counter was outdated.")
                bytesSoundFile = newSF
                readData = bytesSoundFile.buffer_read(framesToRead, dtype=dtype)
                print("Now read " + str(len(readData)) +
                      " bytes.")
                print("")

    return readData

Obviously it would still be nice if the bug can actually be fixed.

lugia19 avatar Apr 22 '23 11:04 lugia19

The assert issue is mostly secondary - the issue I'm running into now is the zero-byte reads, which do not cause an AssertionError at all.

If I understand this correctly, then the assert actually very much is the issue!

What happens during the While loop is the following:

  • partialSoundFile reads data correctly
  • Once it reaches the amount of frames it estimated being in the file earlier, it stops returning any data
  • We know that this isn't the end of the audiodata, as the fullSoundFile reports there being more frames after this.

This bug is caused by the assert. Libsndfile correctly returns more frames, but the assert stops SoundFile from reading beyond .frames. So if we just didn't assert, then we'd continue reading despite the incorrect .frames, which is exactly what we want. Or am I misunderstanding something?

bastibe avatar Apr 23 '23 13:04 bastibe

Maybe I'm the one misunderstanding, I'll check again and let you know, I might've just gotten it wrong

lugia19 avatar Apr 23 '23 13:04 lugia19

So, after some more testing, I have the following to report. Note that I switched to read instead of buffer_read for these tests.

To make a concrete example, I logged the positions in the file I was using for testing. Once the position (as reported by BytesIO.tell) reaches 3426 bytes, it no longer reads any data.

This happens even if the call to _check_frames is commented out, meaning that _array_io is called with 2048 as the value of frames (instead of 0, which is what would happen if the _check_frames call was present).

So regardless of what the frames parameter is, once the end of the initial, partially written file is reached, the call to _array_io will return nothing. I checked and this goes all the way down to sf_readf_double - it is called with 2048 frames as an argument, but it returns zero frames.

lugia19 avatar Apr 24 '23 13:04 lugia19

Thank you very much for the analysis. In that case, it is indeed down to libsndfile. I'd recommend opening an issue at libsndfile. They are usually very receptive to well-researched bug reports such as this.

bastibe avatar Apr 25 '23 14:04 bastibe

For anyone stumbling across this issue, the assert actually ended up giving me more trouble.

It appears that for certain kinds of mp3 files (I've mostly had it happens with files that have ID3v2 tags, but there are others which I still haven't figured out why) it gets "stuck", unable to read the correct number of frames even if the soundfile is remade according to what I found in the rest of the issue.

Here's how I fixed it:

  1. Subclass SoundFile to return the read data regardless when the AssertionError happens, as part of the exception's args:
class BodgedSoundFile(sf.SoundFile):
    def buffer_read(self, frames=-1, dtype=None):
        from _soundfile import ffi as _ffi
        frames = self._check_frames(frames, fill_value=None)
        ctype = self._check_dtype(dtype)
        cdata = _ffi.new(ctype + '[]', frames * self.channels)
        read_frames = self._cdata_io('read', cdata, ctype, frames)
        assert read_frames == frames, _ffi.buffer(cdata)    #Return read data as exception.args[0]
        return _ffi.buffer(cdata)
  1. Handle the exception like so:
# THIS FUNCTION ASSUMES YOU'VE GIVEN THE THREAD RUNNING IT THE _bytesLock LOCK.
    def _soundFile_read_and_fix(self, dataToRead:int=-1, dtype=_defaultDType):
        preReadFramePos = self._bytesSoundFile.tell()
        preReadBytesPos = self._bytesFile.tell()
        try:
            readData = self._bytesSoundFile.buffer_read(dataToRead, dtype=dtype)    #Try to read the data
        except (AssertionError, soundfile.LibsndfileError):
            #The bug happened, so we must be at a point in the file where the reading fails.

            logging.debug("The following is some logging for a new and fun soundfile bug, which I (poorly) worked around. Lovely.")
            logging.debug(f"Before the bug: frame {preReadFramePos} (byte {preReadBytesPos})")
            logging.debug(f"After the bug: frame {self._bytesSoundFile.tell()} (byte {self._bytesFile.tell()})")

            #Release the lock on the underlying BytesIO to allow the download thread to download a bit more of the file.
            self._bytesLock.release()
            time.sleep(0.1) #Wait.
            self._bytesLock.acquire()

            #Let's seek back and re-create the SoundFile, so that we're sure it's fully synched up.
            self._bytesFile.seek(0)
            newSF = BodgedSoundFile(self._bytesFile, mode="r")
            newSF.seek(preReadFramePos)
            self._bytesSoundFile = newSF
            logging.debug(f"Done recreating, now at {self._bytesSoundFile.tell()} (byte {self._bytesFile.tell()}).")

            #Try reading the data again. If it works, good.
            try:
                readData = self._bytesSoundFile.buffer_read(dataToRead, dtype=dtype)
            except (AssertionError, soundfile.LibsndfileError) as ea:
                #If it fails, get the partial data from the exception args.
                readData = ea.args[0]



        #This is the handling for the bug that's described in the rest of the issue. Irrelevant to this new one.
        if dataToRead * self._frameSize != len(readData):
            logging.debug(f"Expected {dataToRead * self._frameSize} bytes, but got back {len(readData)}")
        if len(readData) < dataToRead * self._frameSize:
            logging.debug("Insufficient data read.")
            curPos = self._bytesFile.tell()
            endPos = self._bytesFile.seek(0, os.SEEK_END)
            if curPos != endPos:
                logging.debug("We're not at the end of the file. Check if we're out of frames.")
                logging.debug("Recreating soundfile...")
                self._bytesFile.seek(0)
                newSF = BodgedSoundFile(self._bytesFile, mode="r")
                newSF.seek(self._bytesSoundFile.tell() - int(len(readData) / self._frameSize))
                if newSF.frames > self._bytesSoundFile.frames:
                    logging.debug("Frame counter was outdated.")
                    self._bytesSoundFile = newSF
                    readData = self._bytesSoundFile.buffer_read(dataToRead, dtype=dtype)
                    logging.debug("Now read " + str(len(readData)) +
                          " bytes. I sure hope that number isn't zero.")
        return readData

lugia19 avatar Sep 15 '23 21:09 lugia19

Wow, thank you for your efforts! I hope there will be an upstream response in libsndfile.

bastibe avatar Sep 23 '23 10:09 bastibe