rio icon indicating copy to clipboard operation
rio copied to clipboard

Freature Request: optionally parallelize `convert`

Open juansebastianl opened this issue 2 years ago • 0 comments

Hello! The convert function is a very simple wrapper around the read and write operations of the individual file types. For filetypes with either chunked APIs or with a skip and max_rows parameter it would be possible to read in parts of the file in parallel and write separate output files, or store individual output files in memory and then combine them and write at once (writing in parallel is likely more tricky). There are a lot of cases where this would provide noticeable speedups at the cost of more cpu usage and memory usage (like csv to dta) but other times where it either doesn't make sense at all, or does not increase performance (for data that can't be chunked). Nevertheless, since the majority of the data used by R users follows the basic row-column specification, this would work for a lot of useful datatypes. I think this could be implemented with something as simple as an n_workers argument and future.apply in the background. I would love to hear thoughts on the suggestion!

juansebastianl avatar Sep 25 '21 07:09 juansebastianl