knime-rdkit icon indicating copy to clipboard operation
knime-rdkit copied to clipboard

Switch to using new R-Group Decomposition code

Open greglandrum opened this issue 5 years ago • 4 comments

This probably means deprecating the current node since the options are now different.

The constructor for the data structure with the parameters for the C++ RGD code, which includes the default values, is here: https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/RGroupDecomposition/RGroupDecomp.h#L66

From a quick skim of the generated code, it looks like all of this is exposed to Java in a sensible way.

greglandrum avatar Mar 15 '19 08:03 greglandrum

@greglandrum, a few questions:

  1. What would be the parameters that we want to support beside the input Mol column?
  2. Is there a way to know the maximum number of R groups that will be in the output? (In the past we derived that number from the number of heavy atoms in the SMARTS core that was submitted as parameter)
  3. Some examples would be good. Brian said there are many available...

manuelschwarze avatar Sep 25 '19 14:09 manuelschwarze

@bp-kelley, feel free to answer as well

manuelschwarze avatar Sep 25 '19 15:09 manuelschwarze

@manuelschwarze, answers (I hope):

  1. I updated the description to include a pointer to the parameters. I will try to find some time to list those explicitly, but in case that doesn't happen, the line of C++ that I point to includes all the required info.
  2. Why do you need the max number of R groups and how important is it that the answer be correct? Most of the time you really don't know the number (or names) of output columns until you run the calculation.
  3. Sure, examples/tests are relatively easy to come up with by running some Python

greglandrum avatar Sep 26 '19 04:09 greglandrum

@greglandrum

  1. Thanks. Much clearer now.
  2. Before filling columns in KNIME I need to create them, but at that time the RGroupDecomp did not run yet. It is complicated to dynamically add columns to a table while you are already filling in data. I would like to avoid that, if possible. What I could do is to create always N columns (N very high) and then after filling the table removing the unused ones (or empty ones) again in a post processing step - that is quite easy. Question in that case would be what N should be? Is there a natural limit how many R groups can be created?
  3. Brian mentioned there are examples already available. Any useful link you can provide?

manuelschwarze avatar Sep 26 '19 07:09 manuelschwarze