bcolz icon indicating copy to clipboard operation
bcolz copied to clipboard

bcolz support for parallel writes

Open ankravch opened this issue 9 years ago • 6 comments

Hi there,

I wonder if it is possible at all for bcolz (for example PyTables cannot do that) to support parallel writes (to the same file) ?

Thank you, Anton

ankravch avatar Sep 19 '16 21:09 ankravch

Hi there,

Out of interest, would you like to use multiple threads or multiple processes?

Also, do you want to append in parallel, or update existing rows in parallel?

Cheers, Alistair

On Monday, September 19, 2016, ankravch [email protected] wrote:

Hi there,

I wonder if it is possible at all for bcolz (for example PyTables cannot do that) to support parallel writes (to the same file) ?

Thank you, Anton

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QrqKtPOSj87_AY8vTIYf_qUa1g9mks5qrwCSgaJpZM4KA_lP .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

alimanfoo avatar Sep 20 '16 16:09 alimanfoo

Hi Alistair, thanks for your interest!

In my case I want to use multiple processes to append in parallel.

Cheers, Anton

ankravch avatar Sep 20 '16 18:09 ankravch

Hi Anton,

My understanding is that bcolz does not provide any kind of write synchronization. If you want to append from multiple processes then you would need to manage some kind of lock in your own application. If all your processes are spawned from a single Python process via the multiprocessing module then one option may be to use the multiprocessing.Lock class. Alternatively there is a 3rd party Python package called fasteners [1] which provides an inter-process lock based on file locking, which may be a more flexible option.

Hth, Alistair

[1] https://fasteners.readthedocs.io/en/latest/api/process_lock.html

On Tuesday, September 20, 2016, ankravch [email protected] wrote:

Hi Alistair, thanks for your interest!

In my case I want to use multiple processes to append in parallel.

Cheers, Anton

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320#issuecomment-248383566, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QtFc05aCaEHBEZVihxogc-8HBv6Hks5qsCDigaJpZM4KA_lP .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

alimanfoo avatar Sep 20 '16 19:09 alimanfoo

You could write to separate ctables with fixed chunk lengths in each process and then combine them in the end, changing the naming and internal carray dicts + appending the "dangly ends" of each ctable in the end. It would be hackish, but should work. Maybe a "ctable_merge([a, b, c])" functionality pull request would be nice. so: possible, but low-level work

CarstVaartjes avatar Sep 20 '16 22:09 CarstVaartjes

@ankravch In case you just need a multi-dimensional container (i.e not a table-like interface), you might want to try zarr. Zarr follows the same principles than bcolz (chunked, compressed containers), but with other bells and whistles (including fasteners support).

FrancescAlted avatar Sep 22 '16 08:09 FrancescAlted

Thanks Francesc. Yes zarr has support for synchronizing write operations to an array. Multiple threads or processes can safely append and/or update the same array, some more info here: http://zarr.readthedocs.io/en/latest/tutorial.html#parallel-computing-and-synchronization. Zarr does not have any table abstraction, however, you would have to work directly with arrays.

On Thursday, September 22, 2016, FrancescAlted [email protected] wrote:

@ankravch https://github.com/ankravch In case you just need a multi-dimensional container (i.e not a table-like interface), you might want to try zarr https://github.com/alimanfoo/zarr. Zarr follows the same principles than bcolz (chunked, compressed containers), but with other bells and whistles (including fasteners support).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Blosc/bcolz/issues/320#issuecomment-248843285, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8QhaZcQ0GmcqXKyVqF13cwGcjPmVcks5qsj4mgaJpZM4KA_lP .

Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: [email protected] Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721

alimanfoo avatar Sep 22 '16 08:09 alimanfoo