loomR
loomR copied to clipboard
Operating on loom files in parallel
I read in your source code that there're lots of for loops when validating matrices, layers, and attributes while those can be parallelized, and often chunk based operations on matrices stored in loom files are embarrassingly parallel. It seems that parallel operations on loom objects are possible, see http://www.msmith.de/2018/05/01/parallel-r-hdf5/. Can you provide parallel options to what can be parallelized, and provide parallel versions of map and apply? This can speed up operations on large datasets a lot, which is really slow at present.
This is something that we'd like to do, but have put lower on our to-do list as we're still working on getting loomR stable and submitted to CRAN. We also need to check to see if hdf5r (the HDF5 library we use for reading and writing HDF5 files) supports parallel IO. If it does, we would very much like to implement it.