streamz
streamz copied to clipboard
Non-blocking File I/O
Streamz.from_textfile
could probably be improved to read from files in a non-blocking fashion, and only emit new data once full lines are written. Apparently we'll probably need a separate thread to do this well.
This would make a common and important class of solutions much more robust, and move this project from the "interesting" to "pragmatically useful" status in many cases.
cc @martindurant @yuvipanda
Referencing conversation in gitter here: https://gitter.im/dask/dev?at=5aa812f78f1c77ef3ab81dfd
Also, just as an FYI, I'm not working much and have limited connectivity this week.
f9484ba in #150 may be useful here. I don't think there's a need to attempt to do our own buffering with select
or somesuch. Data appearing in a file is usually line-buffered anyway.
Data appearing in a file is usually line-buffered anyway
I think that for infrastructural work like this we need things to work all the time, not just usually. Recall also that this might not be a file, but a socket or other such file-based object. Line breaks (or delimiters of any sort) may not occur transactionally with reads.
Actually, keeping a 1-line buffer should be simple enough. If #150 too over-loaded to add it there?
Might be of interest to take a look at https://github.com/Tinche/aiofiles
iofiles
exists as a conda package, but only for py35/36.
https://github.com/conda-forge/aiofiles-feedstock seems to be noarch, so any python version greater than 36
The various file and socket IO stuff could all be converted. I'm pretty sure it'd make little difference in the case of (local) files, but since we are already all-in for tornado/async, would make sense.