sigmf-python _read_datafile() calls file close() on every partial dataset read

https://github.com/sigmf/sigmf-python/blob/c5d194d5e659def926d25737baa7b6cbbb4887bd/sigmf/sigmffile.py#L679

I have been working on a project which operates wonderfully on a 16GB RAM laptop but has to be simplified if it will ever run successfully on a 512kB RAM Raspberry Pi Zero 2 W.

I see that should a dataset be read incrementally (reducing memory footprint), rather than all-at-once (non-issue on laptop with gobs of memory), each call of read_samples() in turn calls _read_datafile() which will perform an open(), seek(), and a close().

Perhaps the file management should be promoted to read_samples() and the parent SigMFFile class? Then the dataset reads could be performed with a single open() and one concluding close() with any number of seek() and read() in between ?

This enhancement suggestion would also make possible the use of mmap() and thus place the system memory management burden on the OS rather than the Python runtime.

Mar 01 '24 16:03 csylvain

SigMFArchiveReader already uses mmap(), check it out.

There definitely are gaps in the implementation of it and SigMFFile, contributions welcome.

Glen

On Fri, Mar 1, 2024 at 9:20 AM CSylvain @.***> wrote:

https://github.com/sigmf/sigmf-python/blob/c5d194d5e659def926d25737baa7b6cbbb4887bd/sigmf/sigmffile.py#L679

I have been working on a project which operates wonderfully on a 16GB RAM laptop but has to be simplified if it will ever run successfully on a 512kB RAM Raspberry Pi Zero 2 W.

I see that should a dataset be read incrementally (reducing memory footprint), rather than all-at-once (non-issue on laptop with gobs of memory), each call of read_samples() in turn calls _read_datafile() which will perform an open(), seek(), and a close().

Perhaps the file management should be promoted to read_samples() and the parent SigMFFile class? Then the dataset reads could be performed with a single open() and one concluding close() with any number of seek() and read() in between ?

This enhancement suggestion would also make possible the use of mmap() and thus place the system memory management burden on the OS rather than the Python runtime.

— Reply to this email directly, view it on GitHub https://github.com/sigmf/sigmf-python/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABVTOUA6C5G6C2SPZLJBN2TYWCTFXAVCNFSM6AAAAABECA5X6GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DGNZTGQYDMOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Mar 01 '24 16:03 gmabey

I hadn't looked at SigMFArchiveReader because my dataset is just one file of pre-computed IQ samples. Thanks for calling my attention to it.

I see archive reading uses Numpy's memmap(): "NumPy’s memmap’s are array-like objects. This differs from Python’s mmap module, which uses file-like objects." Numpy documentation Python documentation However, read_samples() uses _read_datafile() which looks like it is using a file-like mechanism.

I can already report with the existing read_samples() implementation, partial reads works smoothly on the 512kB RAM device, where an all-at-once read suffers from random TX underruns.

Mar 01 '24 20:03 csylvain

@csylvain were you able to get things working?

At the moment you are correct read_samples() opens and closes the file, however as gmabey suggests you can instead read the memory mapped version by slicing the file. For example, for most datatypes these should yeild the same data:

some_samples = meta.read_samples(start_index=10, count=100) # using file open/close
some_samples = meta[10:100] # uses memmap

There are pros & cons to memmap vs read, but maybe we should be more clear about it.

Dec 20 '24 18:12 Teque5

I did manage to get it working, but the exact details I do not presently recall. I did not have a SigMF archive. I did have pre-synthesized samples in a SigMF file. I do recall at first the roughly 180 MB of samples did not get streamed to the SDR smoothly. I may have abandoned SigMF entirely for a file of raw samples.

I was not aware of slicing the file, and I would recommend that get into the Documentation.

Thank you for following up!

On Fri, Dec 20, 2024 at 1:12 PM Teque5 @.***> wrote:

@csylvain https://github.com/csylvain were you able to get things working?

At the moment you are correct read_samples() opens and closes the file, however as gmabey suggests you can instead read the memory mapped version by slicing the file. For example, for most datatypes these should yeild the same data:

some_samples = meta.read_samples(start_index=10, count=100) # using file open/closesome_samples = meta[10:100] # uses memmap

There are pros & cons to memmap vs read, but maybe we should be more clear about it.

— Reply to this email directly, view it on GitHub https://github.com/sigmf/sigmf-python/issues/53#issuecomment-2557493360, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRUYNM4W75ZAZKYU646FAL2GRMZFAVCNFSM6AAAAABT7THO3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJXGQ4TGMZWGA . You are receiving this because you were mentioned.Message ID: @.***>

Dec 24 '24 02:12 csylvain

sigmf-python sigmf-python copied to clipboard

_read_datafile() calls file close() on every partial dataset read

sigmf-python
sigmf-python copied to clipboard