scarches icon indicating copy to clipboard operation
scarches copied to clipboard

scGen model is very slow

Open joseph-siefert opened this issue 2 years ago • 9 comments

Thanks for the great tools! Is the scGen model in scArches using the newest scGen version? I can run scGen no problem and it runs pretty fast, but when I run the scGen model with scArches is very slow. I would like to run with scArches so I can map query data onto the reference after correcting batch effects in the reference.

#runs fine
scgen.SCGEN.setup_anndata(adata, batch_key="dataset", labels_key="cell_type")
model = scgen.SCGEN(adata)
model.train(
    max_epochs=100,
    batch_size=32,
    early_stopping=True,
    early_stopping_patience=25,
)
#super slow
epoch = 50
early_stopping_kwargs = {
    "early_stopping_metric": "val_loss", #I have also tried elbo_metric and still very slow
    "patience": 20,
    "threshold": 0,
    "reduce_lr": True,
    "lr_patience": 13,
    "lr_factor": 0.1,
}
network = sca.models.scgen(adata = source_adata , hidden_layer_sizes=[256,128])
network.train(n_epochs=epoch, early_stopping_kwargs=early_stopping_kwargs, use_gpu=True)

Am I missing something, or is the scGen model in scArches not optimal?

joseph-siefert avatar Nov 10 '22 23:11 joseph-siefert

Hi thanks for you trying it, which step is specifically slower?

M0hammadL avatar Nov 11 '22 03:11 M0hammadL

The training

joseph-siefert avatar Nov 11 '22 03:11 joseph-siefert

It could be since they are basically two different implementations. Are trying on cpu or gpu?

M0hammadL avatar Nov 11 '22 04:11 M0hammadL

GPU

On Thu, Nov 10, 2022 at 20:04 M0hammadL @.***> wrote:

It could be since they are basically two different implementations. Are trying on cpu or gpu?

— Reply to this email directly, view it on GitHub https://github.com/theislab/scarches/issues/142#issuecomment-1311203065, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGL4NCIVO5V4P7OTSFMS73TWHXAU5ANCNFSM6AAAAAAR5CNZNM . You are receiving this because you authored the thread.Message ID: @.***>

joseph-siefert avatar Nov 11 '22 04:11 joseph-siefert

Have you check if the model is using gpu?

@alextopalova could you plz check this out? And open a Issue for this code to add scgen in theislab repo (theislab/scgen) imported here? Only adaption would be writing a function of fir batch correction and reference mapping which is implemented here which needs to be adapted to that version

M0hammadL avatar Nov 11 '22 04:11 M0hammadL

I'm not sure how to check. When I run scGen I get the output: GPU available: True (cuda), used: True When I run scArches with scGen model I do not get the same output. I do set the use_gpu=True flag, but not sure how to verify that it is actually using GPU.

Inspecting the resulting model shows that scArches uses the vaearith model of scGen. I don't see this stated explicitly when using scGen directly, but perhaps I don't know where this information is stored.

UPDATE: I believe it's running on CPU. I've monitored GPU usage from another notebook, and as far as I can tell it is not using GPU. Also, when I start the process the CPU usage goes from 0 to 100%. I can run scGEN with GPU in this same notebook, so it's not the environment. I also have enabled the use_gpu=True flag, but still scArches does not seem to be using GPU for the scGEN model

torch.cuda.is_available()
True

I can even see the process logged on a CudaDevice, but GPU utilization is 0% and CPU is 100%

joseph-siefert avatar Nov 11 '22 17:11 joseph-siefert

I appears this may have to do with the size of the matrix. If I use a very small subset the GPU utilization is slightly higher (10-12% max) and if I used a larger subset GPU utilization is around 1-2% max. CPU utilization is still quite high intermittently, but it will finish in a reasonable amount of time. For a very large matrix the time is too long (several days). Seems that either the CPU step is causing a bottleneck for very large matrices, or the data loading to GPU is not optimal.

joseph-siefert avatar Nov 15 '22 20:11 joseph-siefert

It should be data training then which seems to be not supper efficient, we will work on it but that will take some time. Happy to merge if you have a pr here

M0hammadL avatar Nov 15 '22 21:11 M0hammadL

any updates on this?

kotr98 avatar Jan 25 '24 10:01 kotr98