descriptastorus icon indicating copy to clipboard operation
descriptastorus copied to clipboard

Morgan fingerprint generator warnings

Open swansonk14 opened this issue 1 year ago • 2 comments

Hi @bp-kelley,

Thank you for the great package!

I noticed recently that with newer versions of RDKit (e.g., 2024.3.5), running the descriptastorus.descriptors.rdNormalizedDescriptors.RDKit2DNormalized descriptor prints out the warning message DEPRECATION WARNING: please use MorganGenerator. I believe this is due to a suggested change in how Morgan fingerprints are generated, as detailed here: https://greglandrum.github.io/rdkit-blog/posts/2023-01-18-fingerprint-generator-tutorial.html

Would you be able to update the Morgan fingerprint generation code to remove these deprecation warnings?

Thanks! Kyle

swansonk14 avatar Sep 02 '24 18:09 swansonk14

I'm releasing a new version next week, these warnings will be gone. Thanks for the updates.

bp-kelley avatar Sep 05 '24 16:09 bp-kelley

Great, thank you so much!!

swansonk14 avatar Sep 06 '24 04:09 swansonk14

Sorry for the delay, I had to fix an issue with fr_unbrch_alkane that was changing the distributions.

The master branch is now updated and you should be able to install from there while I get the newer version up to pypi.

bp-kelley avatar Sep 25 '24 14:09 bp-kelley

2.7.0 is up on PyPi

bp-kelley avatar Sep 25 '24 17:09 bp-kelley

Hi @bp-kelley,

Thank you so much for working on this!

Unfortunately, I still see a few issues after upgrading to 2.7.0.

  1. The warning DEPRECATION WARNING: please use MorganGenerator still appears after FpDensityMorgan1, FpDensityMorgan2, and FpDensityMorgan3.

  2. Now when running the RDKit2DNormalized generator, it prints out the name of every property (e.g., == BalabanJ, == BertzCT, == Chi0, etc.) for every molecule, which clutters the output.

  3. There seem to be some small changes with the fr_unbrch_alkane property. For example, the SMILES C=CCSCC(=O)N(CC)CC(CC)NC(=O)c1cccc(O)c1OC(F)(F)F has a fr_unbrch_alkane value of 4.7035982e-08 using version 2.6.1 but a value of 0.9607067 using version 2.7.0. (This discrepancy appeared for 6 out of the 177 molecules I tried.)

I'm sorry to ask for more help, but would you be able to look into these issues?

Thanks! Kyle

swansonk14 avatar Sep 26 '24 18:09 swansonk14

This looks very much like I didn't upload the correct build. Could you try the same tests with the master branch?

fr_branched_alkane changed in the latest rdkit, so I had to put a fix in, I'll double check to see what exactly was uploaded to pypi

bp-kelley avatar Sep 26 '24 19:09 bp-kelley

I yanked the release for now

bp-kelley avatar Sep 26 '24 19:09 bp-kelley

I see the "== " being output, I'll remove these, sorry about that.

Could you also try this, it looks like if qed doesn't exist in the rdkit, I fall back to an internal version which has the deprecations

from rdkit.Chem.Descriptors import qed

Finally, if you have your testing code that would be useful. the fr_unbranched_alkane changed in the rdkit and I had to readd the older version, but there may be a code path where you get the new one.

bp-kelley avatar Sep 26 '24 19:09 bp-kelley

I tried from rdkit.Chem.Descriptors import qed and it seems to work, so my version of RDKit has qed (I'm using version 2024.3.5).

I tried installing the version from the master branch but I'm having the same issues. Here's my test code:

from descriptastorus.descriptors import rdNormalizedDescriptors

generator = rdNormalizedDescriptors.RDKit2DNormalized()
smiles = "C=CCSCC(=O)N(CC)CC(CC)NC(=O)c1cccc(O)c1OC(F)(F)F"
rdkit_fp = generator.process(smiles)[1:]
print(rdkit_fp[197])

When I use version 2.6.1, I get 4.7035980879892365e-08.

However, when I use the version on master (at commit 3f7626ecdc7582df5a148600aa683e4bd768d216), I get 0.960706684375328.

Please let me know if there's anything else you would like me to try.

swansonk14 avatar Sep 27 '24 02:09 swansonk14

The difference is because the RDKit changed the definition of fr_unbrch_alkane and descriptastorus was fitted on the old version (== 0.97), to make all versions of descriptastorus the same, I reverted to the old version of the code. Version 2.6.1 with a newer rdkit would show the incorrect results, or at least, the results there were fitted with the old version.

This was changed here: https://github.com/rdkit/rdkit/pull/7255 in rdkit 2024.03, unfortunately this never ended up in the change list, so I didn't find it until I reran my regression tests

Once this PR gets sorted, I'll be releasing the next version that fits on the current results and adds some descriptors.

I'm still at a loss regarding the FpDensity descriptors, what version of the RDKit are you using.

bp-kelley avatar Sep 27 '24 12:09 bp-kelley

Ah okay I see, thank you for working on this! I'm using RDKit version 2024.3.5.

swansonk14 avatar Sep 27 '24 21:09 swansonk14

It should all be resolved now except for the change fr_unbranched_alkane, I'm using the older version for compatibility. Let me now if you need an option to use the newer one.

bp-kelley avatar Oct 06 '24 20:10 bp-kelley

https://pypi.org/project/descriptastorus

https://pypi.org/project/descriptastorus/2.7.0.4/

bp-kelley avatar Oct 06 '24 20:10 bp-kelley

Hi @bp-kelley,

Thanks for making those changes! It looks like the deprecation warning is gone which is great!

Just to clarify, for the fr_unbranched_alkane, should I expect to get the same values in version 2.6.1 and in version 2.7.0.5, or will they be different? I just tried the same example molecule from https://github.com/bp-kelley/descriptastorus/issues/32#issuecomment-2378279906, and I'm getting different values in version 2.6.1 and version 2.7.0.5.

Thanks, Kyle

swansonk14 avatar Oct 09 '24 22:10 swansonk14

The fr_unbranched_alkane change was in the rdkit not descriptastorus. 2.7.0.5 makes all versions of rdkit return the value prior to rdkit 2024.03.

This, unfortunately may change your results but it will be stable going forwards.

if you want to standardize on the newer fr branched_alkane I can make an option for that as well.

bp-kelley avatar Oct 10 '24 01:10 bp-kelley

Ah I see, thank you for explaining! I agree that this seems like the right solution, at least by default. I'll just be careful to note which combination of descriptastorus and rdkit versions I used for my projects so that the results are reproducible. Thank you again for fixing this!

swansonk14 avatar Oct 11 '24 19:10 swansonk14