padelpy icon indicating copy to clipboard operation
padelpy copied to clipboard

Padelpy GPU version?

Open AnjaliSetiya opened this issue 2 years ago • 4 comments

Hello I want to know if the library is compatible on GPU? The PadelPy library on CPU is quite slow to generate fingerprints of around ~10000 molecules it takes me around 3-4 hours or even more sometimes. if GPU version isn't available, how can the process be speed up? Please let me know Thanks Anjali

AnjaliSetiya avatar Apr 26 '22 12:04 AnjaliSetiya

Hi @AnjaliSetiya,

The source code for PaDEL-Descriptor, while open source, is written in Java which, I will not lie, is not a language I have much experience with.

I'm going to leave this issue open, hopefully someone with more familiarity with Java and/or PaDEL-Descriptor's source code can chime in (and let us know if this is possible!).

Best, Travis

tjkessler avatar Sep 08 '22 23:09 tjkessler

I don't about GPU programming to accelerate this, which I think would need to be done upstream in the actual PaDEL-Descriptor source code, but what could be done here is using Python's multiprocessing to divide the lists of molecules into as many processes as possible. It won't get anywhere near the speedup of a true GPU implementation of the actual fingerprint calculation algorithm, but it would hopefully cut execution times down quite substantially -- there will be very little communication overhead and I expect that speedup should scale linearly with the number of processes.

Please let me know if this is of any interest and I can open a PR @tjkessler @AnjaliSetiya

JacksonBurns avatar Feb 07 '23 15:02 JacksonBurns

Hi @JacksonBurns, Please let me know what contributes for a PR.

AnjaliSetiya avatar Feb 13 '23 08:02 AnjaliSetiya

@AnjaliSetiya after further investigation I realized that padelpy actually has a passthrough to PaDel that takes advantage of multiprocessing. This should buy you some huge speedups if you aren't doing it already. See example:

This code snippet takes about 3.5 minutes to run:

smiles = ['C'*50]*100

from padelpy import from_smiles

for smi in smiles:
    from_smiles(smi)

whereas this takes only 11 seconds:

smiles = ['C'*50]*100

from padelpy import from_smiles

from_smiles(smiles)

As far as a GPU version goes, I'm not sure if that's really possible. I can't even find the source code to begin with, but on top of that the calculation of descriptors is a lot of short, 'bursty' calculations that probably won't benefit much. You could also consider looking at this reimplementation that seems to be much faster. Another compelling option would be to just use PaDel directly, rather than through this Python wrapper, and save the output file to later be read into Python.

JacksonBurns avatar Feb 28 '23 16:02 JacksonBurns