fst icon indicating copy to clipboard operation
fst copied to clipboard

Parallel computation of multiple columns

Open MarcusKlik opened this issue 8 years ago • 0 comments

When reading a fst file using multiple cores, the slowest operation is the creation of character vectors (and to some extend also factors). That's because R uses a global string pool and each string in a character vector needs to be created using R's internal code that searches the string pool. This operation can only be done on the main thread. So, a large performance gain can be achieved if these slow 'main thread' operations (creating strings in character vectors) are started immediately after opening the fst file. The processing of non-character columns can be done in parallel on other threads.

For this to work, we need two file pointers reading the fst file simultaneously. One reading the character column and another to read other type of columns.

Thanks @petermuller71 for this very interesting feature request!

MarcusKlik avatar May 10 '17 08:05 MarcusKlik