streamz icon indicating copy to clipboard operation
streamz copied to clipboard

Non-blocking File I/O

Open mrocklin opened this issue 6 years ago • 7 comments

Streamz.from_textfile could probably be improved to read from files in a non-blocking fashion, and only emit new data once full lines are written. Apparently we'll probably need a separate thread to do this well.

This would make a common and important class of solutions much more robust, and move this project from the "interesting" to "pragmatically useful" status in many cases.

cc @martindurant @yuvipanda

Referencing conversation in gitter here: https://gitter.im/dask/dev?at=5aa812f78f1c77ef3ab81dfd

Also, just as an FYI, I'm not working much and have limited connectivity this week.

mrocklin avatar Mar 14 '18 02:03 mrocklin

f9484ba in #150 may be useful here. I don't think there's a need to attempt to do our own buffering with select or somesuch. Data appearing in a file is usually line-buffered anyway.

martindurant avatar Mar 15 '18 23:03 martindurant

Data appearing in a file is usually line-buffered anyway

I think that for infrastructural work like this we need things to work all the time, not just usually. Recall also that this might not be a file, but a socket or other such file-based object. Line breaks (or delimiters of any sort) may not occur transactionally with reads.

mrocklin avatar Mar 15 '18 23:03 mrocklin

Actually, keeping a 1-line buffer should be simple enough. If #150 too over-loaded to add it there?

martindurant avatar Mar 16 '18 00:03 martindurant

Might be of interest to take a look at https://github.com/Tinche/aiofiles

mariusvniekerk avatar Apr 27 '18 19:04 mariusvniekerk

iofiles exists as a conda package, but only for py35/36.

martindurant avatar Apr 28 '18 14:04 martindurant

https://github.com/conda-forge/aiofiles-feedstock seems to be noarch, so any python version greater than 36

CJ-Wright avatar Aug 14 '20 15:08 CJ-Wright

The various file and socket IO stuff could all be converted. I'm pretty sure it'd make little difference in the case of (local) files, but since we are already all-in for tornado/async, would make sense.

martindurant avatar Aug 14 '20 15:08 martindurant