MHKiT-Python icon indicating copy to clipboard operation
MHKiT-Python copied to clipboard

_RDIReader.read_buffer() gets stuck in endless loop

Open ululi1970 opened this issue 7 months ago • 4 comments
trafficstars

Describe the bug:

While reading a long pd0 file from a SentinelV I noticed that under certain circumstances read_buffer() enters into an infinite loop whereby the pointer in the data stream cycles back over the same number of ensembles.

To Reproduce:

The actual file is rather large. At any rate, I have a fix destailed at the bottom that has the added bonus of increasing the processing rate by about 60% (likely due to the precaching of the whole ensemble to execute che checksum.

Expected behavior:

A clear and concise description of what you expected to happen.

Screenshots:

This is time vs. ensemble number if the file is read correctly

Image Note how time increases monotonically with ensemble number.

This is time vs. ensemble number when the file is not read correctly. Note how after about 2000 ensembles, time stops increasing.

Image

Zooming in, it shows that the time is stuck in a loop.

Image

Desktop (please complete the following information):

  • OS: Linux, UBUNTU 24.10
  • MHKiT Version: 0.8.2

Additional context:

The solution I found is to add the following after line 454 in rea_buffer()

make sure that the checksum for this ensemble is correct

        noBytesInEnsemble=fd.read_i16(1)
        # go back to start of ensemble
        fd.seek(-4,1)
        # pack the entire ensemble into a bytearray
        bytesInEnsemble = bytearray(fd.read_ui8(noBytesInEnsemble))
        # get checksum (2 bytes unsigned integer)   
        checksum = fd.read_ui16(1)
        # calculate checksum and check
        # if the checksum is wrong, back up 100 bytes and search for the next 
        # ensemble
        if (sum(bytesInEnsemble) & 0xFFFF ) != checksum:
            logging.warning("Ensemble starting at startpos {} has a checksum error".format(startpos))
            logging.warning("checksum calculated = %s, actual checksum = %s\n" % ((sum(bytesInEnsemble) & 0xFFFF), checksum))
            fd.seek(-100, 1)
            self.read_buffer()
        else:
            # go back to start of ensemble
            fd.seek(-noBytesInEnsemble, 1)

and the following

         fd.seek(-100) 

before the return statement at the end of the function.

ululi1970 avatar Apr 01 '25 21:04 ululi1970

@ululi1970 thanks for bringing this to our attention and providing a solution.

Is it possible you could share a subset of the data causing the issue with @jmcvey3 for us to debug?

ssolson avatar Apr 03 '25 16:04 ssolson

Sure, I have attached a zip file with the pd0 file and the rdi.py that I have modified to read it. Note you will need to have the package tqdm, or just remove the import from line 300 and tqdm from line 301.

files.zip

ululi1970 avatar Apr 03 '25 18:04 ululi1970

I really appreciate this @ululi1970!

I see that there are several other IDs that are undocumented thus far in dolfyn (0x7000, 0x7001, 0x7003, 0x7004), and it looks like these aren't skipping properly (hence you need the fd.seek(-100) to refind itself). I'll work on adding these in, since it looks like I'll need to to return the correct sampling frequency, and read measurements from the fifth beam.

I've been trying for a while to get the code to properly skip unknown data structures (since TRDI tends to adjust data structures often enough), and not fall into the forever loop you've described.

jmcvey3 avatar Apr 30 '25 01:04 jmcvey3

Issue should be solved in PR #396. @ululi1970 please see if that PR fix works for you.

jmcvey3 avatar May 05 '25 18:05 jmcvey3