Morgan fingerprint generator warnings
Hi @bp-kelley,
Thank you for the great package!
I noticed recently that with newer versions of RDKit (e.g., 2024.3.5), running the descriptastorus.descriptors.rdNormalizedDescriptors.RDKit2DNormalized descriptor prints out the warning message DEPRECATION WARNING: please use MorganGenerator. I believe this is due to a suggested change in how Morgan fingerprints are generated, as detailed here: https://greglandrum.github.io/rdkit-blog/posts/2023-01-18-fingerprint-generator-tutorial.html
Would you be able to update the Morgan fingerprint generation code to remove these deprecation warnings?
Thanks! Kyle
I'm releasing a new version next week, these warnings will be gone. Thanks for the updates.
Great, thank you so much!!
Sorry for the delay, I had to fix an issue with fr_unbrch_alkane that was changing the distributions.
The master branch is now updated and you should be able to install from there while I get the newer version up to pypi.
2.7.0 is up on PyPi
Hi @bp-kelley,
Thank you so much for working on this!
Unfortunately, I still see a few issues after upgrading to 2.7.0.
-
The warning
DEPRECATION WARNING: please use MorganGeneratorstill appears afterFpDensityMorgan1,FpDensityMorgan2, andFpDensityMorgan3. -
Now when running the
RDKit2DNormalizedgenerator, it prints out the name of every property (e.g.,== BalabanJ,== BertzCT,== Chi0, etc.) for every molecule, which clutters the output. -
There seem to be some small changes with the
fr_unbrch_alkaneproperty. For example, the SMILESC=CCSCC(=O)N(CC)CC(CC)NC(=O)c1cccc(O)c1OC(F)(F)Fhas afr_unbrch_alkanevalue of4.7035982e-08using version 2.6.1 but a value of0.9607067using version 2.7.0. (This discrepancy appeared for 6 out of the 177 molecules I tried.)
I'm sorry to ask for more help, but would you be able to look into these issues?
Thanks! Kyle
This looks very much like I didn't upload the correct build. Could you try the same tests with the master branch?
fr_branched_alkane changed in the latest rdkit, so I had to put a fix in, I'll double check to see what exactly was uploaded to pypi
I yanked the release for now
I see the "==
Could you also try this, it looks like if qed doesn't exist in the rdkit, I fall back to an internal version which has the deprecations
from rdkit.Chem.Descriptors import qed
Finally, if you have your testing code that would be useful. the fr_unbranched_alkane changed in the rdkit and I had to readd the older version, but there may be a code path where you get the new one.
I tried from rdkit.Chem.Descriptors import qed and it seems to work, so my version of RDKit has qed (I'm using version 2024.3.5).
I tried installing the version from the master branch but I'm having the same issues. Here's my test code:
from descriptastorus.descriptors import rdNormalizedDescriptors
generator = rdNormalizedDescriptors.RDKit2DNormalized()
smiles = "C=CCSCC(=O)N(CC)CC(CC)NC(=O)c1cccc(O)c1OC(F)(F)F"
rdkit_fp = generator.process(smiles)[1:]
print(rdkit_fp[197])
When I use version 2.6.1, I get 4.7035980879892365e-08.
However, when I use the version on master (at commit 3f7626ecdc7582df5a148600aa683e4bd768d216), I get 0.960706684375328.
Please let me know if there's anything else you would like me to try.
The difference is because the RDKit changed the definition of fr_unbrch_alkane and descriptastorus was fitted on the old version (== 0.97), to make all versions of descriptastorus the same, I reverted to the old version of the code. Version 2.6.1 with a newer rdkit would show the incorrect results, or at least, the results there were fitted with the old version.
This was changed here: https://github.com/rdkit/rdkit/pull/7255 in rdkit 2024.03, unfortunately this never ended up in the change list, so I didn't find it until I reran my regression tests
Once this PR gets sorted, I'll be releasing the next version that fits on the current results and adds some descriptors.
I'm still at a loss regarding the FpDensity descriptors, what version of the RDKit are you using.
Ah okay I see, thank you for working on this! I'm using RDKit version 2024.3.5.
It should all be resolved now except for the change fr_unbranched_alkane, I'm using the older version for compatibility. Let me now if you need an option to use the newer one.
https://pypi.org/project/descriptastorus
https://pypi.org/project/descriptastorus/2.7.0.4/
Hi @bp-kelley,
Thanks for making those changes! It looks like the deprecation warning is gone which is great!
Just to clarify, for the fr_unbranched_alkane, should I expect to get the same values in version 2.6.1 and in version 2.7.0.5, or will they be different? I just tried the same example molecule from https://github.com/bp-kelley/descriptastorus/issues/32#issuecomment-2378279906, and I'm getting different values in version 2.6.1 and version 2.7.0.5.
Thanks, Kyle
The fr_unbranched_alkane change was in the rdkit not descriptastorus. 2.7.0.5 makes all versions of rdkit return the value prior to rdkit 2024.03.
This, unfortunately may change your results but it will be stable going forwards.
if you want to standardize on the newer fr branched_alkane I can make an option for that as well.
Ah I see, thank you for explaining! I agree that this seems like the right solution, at least by default. I'll just be careful to note which combination of descriptastorus and rdkit versions I used for my projects so that the results are reproducible. Thank you again for fixing this!