CytoPy icon indicating copy to clipboard operation
CytoPy copied to clipboard

Saving clusters takes forever

Open jfgonsalves opened this issue 1 year ago • 1 comments

Not sure if this will be addressed in upcoming split of CytoPy into different tools but saving clusters takes ages.

I'm running on bare metal in a fresh Conda env with a reasonably powerful Core i5. I've run CytoPy on an Apple M1 Mac Mini and have the same issue.

It looks like the the save function is single threaded - looking at top it seems like I've got 1 thread fully loaded and 11 others sitting idle.

I'm a bit confused as to what CytoPy is doing with this function - is it writing to Mongo?

I wonder if there is a way we can split the save function into multiple threads. I'm happy to give it a go but wondering if @burtonrj could comment on feasability? Just briefly skimming the mongo docs it seems like there has been support for concurrency for some time.

jfgonsalves avatar Aug 31 '22 01:08 jfgonsalves

Hi @jfgonsalves, it is currently single threaded. Part of the problem is that I use Mongoengine, which registers connections globally (https://mongoengine-odm.readthedocs.io/guide/connecting.html). It was a long time ago now, but if I remember it was giving some horrible warning messages with multiprocessing.

With the new version, I want to have a user-defined config file for connection strings, and connections can be created and destroyed within each child process. This is a big refactor though. I'm very very close to finishing my thesis, so promise this will be getting my attention soon.

burtonrj avatar Aug 31 '22 08:08 burtonrj