aiofiles
aiofiles copied to clipboard
From a txt (compressed or not: txt, txt.bz, txt.bz2), allowing lines to be processed while other lines are loaded in the background, with support for buffering, encoding, and detailed logging settings.
See example (only txt) at: https://gist.github.com/91b58e3ab8e10025cfa4a5935bcfaaa4.
To read any compressed file, can be:
@asynccontextmanager
async def async_read_txt_file(filename: str,
buffer_hint: int = -1,
encoding='utf-8',
errors=None,
verbose=False):
if verbose:
LOGGER.setLevel(logging.DEBUG)
open_file = (gzip.open if filename.endswith('.gz') \
else (bz2.open if filename.endswith('.bz2') \
else open))
multiply_buffer = 3 if filename.endswith('.bz2') else 1
buffer_hint = max(buffer_hint, BUFFER_HINT)
buffer_hint = min(buffer_hint, os.path.getsize(filename) * multiply_buffer)
kwargs = {'mode': 'rt'}
if encoding is not None:
kwargs.update({'encoding': encoding})
if errors is not None:
kwargs.update({'errors': errors})
LOGGER.info(f"Opening file {filename} with buffer hint {buffer_hint} and keyword arguments {kwargs}...")
with open_file(filename, **kwargs) as opened_file:
def _readlines_():
LOGGER.debug(f"Reading lines from file {filename}")
# may be slow as it has disk access
lines = opened_file.readlines(buffer_hint)
if lines:
LOGGER.debug(f"{len(lines)} lines read from file {filename}")
else:
LOGGER.debug(f"End of file reading: {filename}")
return lines
async def _gen_():
lines = _readlines_()
task = None
while lines:
task = asyncio.gather(asyncio.to_thread(_readlines_))
for line in lines:
yield line
lines = await task
lines = lines[0]
yield _gen_()
LOGGER.setLevel(logging.INFO)
I think what you're asking for here is out of scope for aiofiles. However, I would take async versions of GzipFile/BZ2File if someone were to contribute quality implementations.