bcolz support for parallel writes
Hi there,
I wonder if it is possible at all for bcolz (for example PyTables cannot do that) to support parallel writes (to the same file) ?
Thank you, Anton
Hi there,
Out of interest, would you like to use multiple threads or multiple processes?
Also, do you want to append in parallel, or update existing rows in parallel?
Cheers, Alistair
On Monday, September 19, 2016, ankravch [email protected] wrote:
Hi there,
I wonder if it is possible at all for bcolz (for example PyTables cannot do that) to support parallel writes (to the same file) ?
Thank you, Anton
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QrqKtPOSj87_AY8vTIYf_qUa1g9mks5qrwCSgaJpZM4KA_lP .
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
Hi Alistair, thanks for your interest!
In my case I want to use multiple processes to append in parallel.
Cheers, Anton
Hi Anton,
My understanding is that bcolz does not provide any kind of write synchronization. If you want to append from multiple processes then you would need to manage some kind of lock in your own application. If all your processes are spawned from a single Python process via the multiprocessing module then one option may be to use the multiprocessing.Lock class. Alternatively there is a 3rd party Python package called fasteners [1] which provides an inter-process lock based on file locking, which may be a more flexible option.
Hth, Alistair
[1] https://fasteners.readthedocs.io/en/latest/api/process_lock.html
On Tuesday, September 20, 2016, ankravch [email protected] wrote:
Hi Alistair, thanks for your interest!
In my case I want to use multiple processes to append in parallel.
Cheers, Anton
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320#issuecomment-248383566, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QtFc05aCaEHBEZVihxogc-8HBv6Hks5qsCDigaJpZM4KA_lP .
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721
You could write to separate ctables with fixed chunk lengths in each process and then combine them in the end, changing the naming and internal carray dicts + appending the "dangly ends" of each ctable in the end. It would be hackish, but should work. Maybe a "ctable_merge([a, b, c])" functionality pull request would be nice. so: possible, but low-level work
@ankravch In case you just need a multi-dimensional container (i.e not a table-like interface), you might want to try zarr. Zarr follows the same principles than bcolz (chunked, compressed containers), but with other bells and whistles (including fasteners support).
Thanks Francesc. Yes zarr has support for synchronizing write operations to an array. Multiple threads or processes can safely append and/or update the same array, some more info here: http://zarr.readthedocs.io/en/latest/tutorial.html#parallel-computing-and-synchronization. Zarr does not have any table abstraction, however, you would have to work directly with arrays.
On Thursday, September 22, 2016, FrancescAlted [email protected] wrote:
@ankravch https://github.com/ankravch In case you just need a multi-dimensional container (i.e not a table-like interface), you might want to try zarr https://github.com/alimanfoo/zarr. Zarr follows the same principles than bcolz (chunked, compressed containers), but with other bells and whistles (including fasteners support).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320#issuecomment-248843285, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QhaZcQ0GmcqXKyVqF13cwGcjPmVcks5qsj4mgaJpZM4KA_lP .
Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721