fst
fst copied to clipboard
Feature request: Support streaming
Thanks for this wonderful project! Please provide a streaming facility (ideally multiprocess safe) to read data in chunks and process it.
Here is my experimental implementation for your reference.
Hi @talegari ,
thanks for your feature request and posting your code!
Yes, support for streaming would be nice, your example is already very useful for processing a fst file in chunks.
Also, when write streams are added, things can get even more interesting, because then you can process large files (in chunks) even when they don't fit in RAM. The results can be streamed to a new fst file to be processed further. Write streams would also be useful for converting csv files to fst files with small memory overhead (and vice versa).
Would random access be interesting for streaming? For example, a seek(row_nr) method could re-position the current row to a new position in the table (a bit like fstream in C++).
@MarcusKlik Thanks for the quick reply. Yes, random access via seek will be helpful in resetting the current position.
The above experimental implementation assumes static fst file, appending and/or write stream functionality will add additional complexity and design decisions. About file locking, there might smarter lock free methods to work with multiple processes which are worth exploring.
Again, thanks for supporting this feature request. It beings fst closer to production work-cases.