mdfreader
mdfreader copied to clipboard
[Improvement] - unzip to RAM instead of disk
Python version
3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Platform information
Windows-10-10.0.18362-SP0
Numpy version
1.20.1
mdfreader version
4.1
Description
https://github.com/ratal/mdfreader/blob/d1822ee4aa2b466ef0412756bef47ebd5a840dc3/mdfreader/mdf.py#L696
Passing in a zipped .dat file to Mdf like
yop = mdfreader.Mdf(file_name='DatFile.zip')
will result in the .zip file being extracted to my working directory. Is it possible to extract the zip into RAM instead of SSD/HDD?
When using the multiprocessing library, the bottleneck becomes SSD read/write speed. Wondering if this can be sped up by just using RAM instead.
I'm not sure if zipfile.ZipFile.read() or .open() would work? Some say that io.BytesIO would also do the trick. Most solutions for 'unzip to RAM' assume that we are requesting the file over the internet, but the zip is local. When extracted, the contents would fit in RAM.
Thanks
Thanks for the idea, could be investigated. ZipFile allows read() and seek() so it could read the file transparently while decompressing it but I do not think it loads the complete file into memory. In the end, if there is a lot of pointer travel in the file (can happen for reading block that could be a bit everywhere), it could lead to performance penalty while keeping memory impact. I guess should be benchmarked. BytesIO could load in memory it seems but I am wondering if it is appropriate for all use cases -> If going in this direction, I would recommend to make it optional.
Thanks for your thoughts. I will try to learn more about BytesIO and see if I can implement something.