openff-toolkit Molecule.enumerate_tautomers() can return empty list

trafficstars

Describe the bug Molecule.enumerate_tautomers() can return an empty list---rather than the single tautomer or original molecule---if tautomer enumeration fails.

To Reproduce

>>> print(openff.toolkit.__version__)
0.11.3
>>> from openff.toolkit.topology import Molecule
>>> offmol = Molecule.from_smiles('[H]C1(C(C(C(C(C1([H])[H])([H])[H])([H])[C@]([H])(C([H])([H])[H])N([H])[H])([H])[H])([H])[H])[H]')
>>> offmol.enumerate_tautomers()
[]

Computing environment (please complete the following information):

Operating system: linux64
Output of running conda list

# packages in environment at /lila/home/chodera/miniconda/envs/perses-dev: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge alabaster 0.7.12 py_0 conda-forge alsa-lib 1.2.3.2 h166bdaf_0 conda-forge amberlite 22.0 pypi_0 pypi ambertools 22.0 py310h206695f_3 conda-forge amberutils 21.0 pypi_0 pypi anyio 3.6.2 pyhd8ed1ab_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 pypi_0 pypi arpack 3.7.0 hdefa2d7_2 conda-forge arsenic 0.2.1 py310hff52083_0 conda-forge asttokens 2.1.0 pyhd8ed1ab_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge attrs 22.1.0 pyh71513ae_1 conda-forge autograd 1.5 pyhd8ed1ab_0 conda-forge babel 2.11.0 pyhd8ed1ab_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.1 pyhd3eb1b0_0 backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.11.1 pyha770c72_0 conda-forge blas 1.1 openblas conda-forge bleach 5.0.1 pyhd8ed1ab_0 conda-forge blosc 1.21.1 h83bc5f7_3 conda-forge bokeh 2.4.3 pypi_0 pypi boost 1.74.0 py310h7c3ba0c_5 conda-forge boost-cpp 1.74.0 h6cacc03_7 conda-forge boto3 1.26.3 pyhd8ed1ab_0 conda-forge botocore 1.29.3 pyhd8ed1ab_0 conda-forge brotli 1.0.9 h166bdaf_8 conda-forge brotli-bin 1.0.9 h166bdaf_8 conda-forge brotlipy 0.7.0 pypi_0 pypi bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2022.10.11 h06a4308_0 cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.2.0 pyhd8ed1ab_0 conda-forge cairo 1.16.0 ha12eb4b_1010 conda-forge certifi 2022.9.24 pyhd8ed1ab_0 conda-forge cffi 1.15.1 pypi_0 pypi cftime 1.6.2 pypi_0 pypi charset-normalizer 2.1.1 pyhd8ed1ab_0 conda-forge click 8.1.3 pypi_0 pypi cloudpathlib 0.9.0 pypi_0 pypi cloudpathlib-s3 0.9.0 py310hff52083_1 conda-forge cloudpickle 2.2.0 pyhd8ed1ab_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge commonmark 0.9.1 py_0 conda-forge contourpy 1.0.6 pypi_0 pypi coverage 6.5.0 pypi_0 pypi cryptography 38.0.3 pypi_0 pypi cudatoolkit 10.2.89 hca6d478_8 jaimergp/label/unsupported-cudatoolkit-shim curl 7.86.0 h2283fc2_1 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cython 0.29.32 pypi_0 pypi cytoolz 0.12.0 pypi_0 pypi dask 2022.10.2 pyhd8ed1ab_0 conda-forge dask-core 2022.10.2 pyhd8ed1ab_0 conda-forge dask-jobqueue 0.8.1 pyhd8ed1ab_0 conda-forge dataclasses 0.8 pyhc8e2a94_3 conda-forge dbus 1.13.18 hb2f20db_0 debugpy 1.6.3 pypi_0 pypi decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge dicttoxml 1.7.4 pyhd8ed1ab_2 conda-forge distributed 2022.10.2 pyhd8ed1ab_0 conda-forge docutils 0.19 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.0.1 pyhd8ed1ab_0 conda-forge execnet 1.9.0 pyhd8ed1ab_0 conda-forge executing 1.2.0 pyhd8ed1ab_0 conda-forge expat 2.5.0 h27087fc_0 conda-forge fftw 3.3.10 nompi_hf0379b8_105 conda-forge fire 0.4.0 pyh44b312d_0 conda-forge flit-core 3.8.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.1 hc2a2eb6_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.38.0 pypi_0 pypi freetype 2.12.1 hca18f0e_0 conda-forge fsspec 2022.10.0 pyhd8ed1ab_0 conda-forge future 0.18.2 pypi_0 pypi gettext 0.21.1 h27087fc_0 conda-forge glib 2.74.1 h6239696_1 conda-forge glib-tools 2.74.1 h6239696_1 conda-forge greenlet 2.0.1 pypi_0 pypi gst-plugins-base 1.20.2 hcf0ee16_0 conda-forge gstreamer 1.20.3 hd4edc92_2 conda-forge hdf4 4.2.15 h9772cbc_5 conda-forge hdf5 1.12.2 nompi_h4df4325_100 conda-forge heapdict 1.0.1 py_0 conda-forge icu 69.1 h9c3ff4c_0 conda-forge idna 3.4 pyhd8ed1ab_0 conda-forge imagesize 1.4.1 pyhd8ed1ab_0 conda-forge importlib-metadata 5.0.0 pyha770c72_1 conda-forge importlib_resources 5.10.0 pyhd8ed1ab_0 conda-forge iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge ipykernel 6.17.0 pyh210e3f2_0 conda-forge ipython 8.6.0 pyh41d4057_1 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.7.2 pyhd8ed1ab_0 conda-forge jedi 0.18.1 pypi_0 pypi jinja2 3.1.2 pyhd8ed1ab_1 conda-forge jmespath 1.0.1 pyhd8ed1ab_0 conda-forge joblib 1.2.0 pyhd8ed1ab_0 conda-forge jpeg 9e h166bdaf_2 conda-forge jsonschema 4.17.0 pyhd8ed1ab_0 conda-forge jupyter-core 4.11.2 pypi_0 pypi jupyter_client 7.3.4 pyhd8ed1ab_0 conda-forge jupyter_core 4.11.2 py310hff52083_0 conda-forge jupyter_server 1.23.0 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.2.2 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 1.1.1 pyhd8ed1ab_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.3 h08a2579_0 conda-forge lcms2 2.14 h6ed2654_0 conda-forge ld_impl_linux-64 2.39 hc81fddc_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libblas 3.9.0 16_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_8 conda-forge libbrotlidec 1.0.9 h166bdaf_8 conda-forge libbrotlienc 1.0.9 h166bdaf_8 conda-forge libcblas 3.9.0 16_linux64_openblas conda-forge libclang 13.0.1 default_hc23dcda_0 conda-forge libcurl 7.86.0 h2283fc2_1 conda-forge libdeflate 1.14 h166bdaf_0 conda-forge libedit 3.1.20210910 h7f8727e_0 libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h28343ad_4 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc 7.2.0 h69d50b8_2 conda-forge libgcc-ng 12.2.0 h65d4601_19 conda-forge libgfortran 3.0.0 1 conda-forge libgfortran-ng 12.2.0 h69a702a_19 conda-forge libgfortran5 12.2.0 h337968e_19 conda-forge libglib 2.74.1 h606061b_1 conda-forge libgomp 12.2.0 h65d4601_19 conda-forge libiconv 1.17 h166bdaf_0 conda-forge liblapack 3.9.0 16_linux64_openblas conda-forge libllvm11 11.1.0 he0ac6c6_5 conda-forge libllvm13 13.0.1 hf817b99_2 conda-forge libnetcdf 4.8.1 nompi_h21705cb_104 conda-forge libnghttp2 1.47.0 hff17c54_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libogg 1.3.5 h27cfd23_1 libopenblas 0.3.21 pthreads_h78a6416_3 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.38 h753d276_0 conda-forge libpq 14.5 he2d8382_1 conda-forge libsodium 1.0.18 h516909a_1 conda-forge libsqlite 3.39.4 h753d276_0 conda-forge libssh2 1.10.0 hf14f497_3 conda-forge libstdcxx-ng 12.2.0 h46fd767_19 conda-forge libtiff 4.4.0 h55922b4_4 conda-forge libuuid 2.32.1 h14c3975_1000 conda-forge libvorbis 1.3.7 he1b5a44_0 conda-forge libwebp-base 1.2.4 h166bdaf_0 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxkbcommon 1.0.3 he3ba5ed_0 conda-forge libxml2 2.9.12 h885dcf4_1 conda-forge libxslt 1.1.33 h0ef7038_3 conda-forge libzip 1.9.2 hc929e4a_1 conda-forge libzlib 1.2.13 h166bdaf_4 conda-forge llvmlite 0.39.1 pypi_0 pypi locket 1.0.0 pyhd8ed1ab_0 conda-forge lxml 4.8.0 pypi_0 pypi lz4 4.0.2 pypi_0 pypi lz4-c 1.9.3 h9c3ff4c_1 conda-forge lzo 2.10 h516909a_1000 conda-forge markupsafe 2.1.1 pypi_0 pypi matplotlib 3.6.2 pypi_0 pypi matplotlib-base 3.6.2 py310h8d5ebf3_0 conda-forge matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mdtraj 1.9.7 pypi_0 pypi mistune 2.0.4 pyhd8ed1ab_0 conda-forge mmpbsa-py 16.0 pypi_0 pypi mpiplus v0.0.1 pyhd8ed1ab_1003 conda-forge msgpack 1.0.4 pypi_0 pypi msgpack-python 1.0.4 py310hbf28c38_1 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mysql-common 8.0.31 h26416b9_0 conda-forge mysql-libs 8.0.31 hbc51c84_0 conda-forge nbclassic 0.4.8 pyhd8ed1ab_0 conda-forge nbclient 0.7.0 pyhd8ed1ab_0 conda-forge nbconvert 7.2.3 pyhd8ed1ab_0 conda-forge nbconvert-core 7.2.3 pyhd8ed1ab_0 conda-forge nbconvert-pandoc 7.2.3 pyhd8ed1ab_0 conda-forge nbformat 5.7.0 pyhd8ed1ab_0 conda-forge ncurses 6.3 h27087fc_1 conda-forge nest-asyncio 1.5.6 pyhd8ed1ab_0 conda-forge netcdf-fortran 4.6.0 nompi_hc402ea5_101 conda-forge netcdf4 1.6.1 pypi_0 pypi networkx 2.8.8 pyhd8ed1ab_0 conda-forge nomkl 3.0 0 nose 1.3.7 py_1006 conda-forge notebook 6.5.2 pyha770c72_1 conda-forge notebook-shim 0.2.2 pyhd8ed1ab_0 conda-forge nspr 4.33 h295c915_0 nss 3.78 h2350873_0 conda-forge numba 0.56.3 pypi_0 pypi numexpr 2.8.3 pypi_0 pypi numpy 1.23.4 pypi_0 pypi numpydoc 1.5.0 pyhd8ed1ab_0 conda-forge ocl-icd 2.3.1 h7f98852_0 conda-forge ocl-icd-system 1.0.0 1 conda-forge openblas 0.3.21 pthreads_h320a7e8_3 conda-forge openeye-toolkits 2022.1.1 py310_0 openeye openeye-toolkits-python3-linux-x64 2022.1.1 pypi_0 pypi openff-amber-ff-ports 0.0.3 pyh6c4a22f_0 conda-forge openff-arsenic 0+untagged.10.g0234410.dirty pypi_0 pypi openff-forcefields 2.0.0 pyh6c4a22f_0 conda-forge openff-interchange 0.2.2 pyhd8ed1ab_0 conda-forge openff-interchange-base 0.2.2 pyhd8ed1ab_0 conda-forge openff-toolkit 0.11.3 pyhd8ed1ab_0 conda-forge openff-toolkit-base 0.11.3 pyhd8ed1ab_0 conda-forge openff-units 0.2.0 pyh1a96a4e_0 conda-forge openff-utilities 0.1.7 pyh1a96a4e_0 conda-forge openjpeg 2.5.0 h7d73246_1 conda-forge openmm 7.7.0 pypi_0 pypi openmmforcefields 0.11.2 pyhd8ed1ab_1 conda-forge openmmtools 0.21.5 pyhd8ed1ab_0 conda-forge openmoltools 0.8.8 pyhd8ed1ab_1 conda-forge openssl 3.0.7 h166bdaf_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge packmol 1!18.013 0 omnia packmol-memgen 1.2.3rc0 pypi_0 pypi pandas 1.5.1 pypi_0 pypi pandoc 2.19.2 h32600fe_1 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge panedr 0.5.2 py_0 conda-forge parmed 3.4.3 pypi_0 pypi parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.3.0 pyhd8ed1ab_0 conda-forge patsy 0.5.3 pyhd8ed1ab_0 conda-forge pbr 5.11.0 pyhd8ed1ab_0 conda-forge pcre2 10.40 hc3806b6_0 conda-forge pdb4amber 22.0 pypi_0 pypi pdbfixer 1.8.1 pyh6c4a22f_0 conda-forge perl 5.34.0 h5eee18b_1 perses 0.10.1+24.gec9bfb0 dev_0 pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 9.2.0 pypi_0 pypi pint 0.20.1 pyhd8ed1ab_0 conda-forge pip 22.3.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_0 conda-forge plotly 5.11.0 pyhd8ed1ab_0 conda-forge pluggy 1.0.0 pypi_0 pypi progressbar2 4.2.0 pyhd8ed1ab_0 conda-forge prometheus_client 0.15.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.32 pyha770c72_0 conda-forge psutil 5.9.3 pypi_0 pypi pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pycairo 1.21.0 pypi_0 pypi pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.10.2 pypi_0 pypi pygments 2.13.0 pyhd8ed1ab_0 conda-forge pymbar 3.0.5 pypi_0 pypi pymsmt 22.0 pypi_0 pypi pyopenssl 22.1.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pyqt 5.12.3 py310hff52083_8 conda-forge pyqt-impl 5.12.3 py310h1f8e252_8 conda-forge pyqt5 5.12.3 pypi_0 pypi pyqt5-sip 4.19.18 pypi_0 pypi pyqtchart 5.12 pypi_0 pypi pyqtwebengine 5.12.1 pypi_0 pypi pyrsistent 0.19.2 pypi_0 pypi pysocks 1.7.1 pypi_0 pypi pytables 3.7.0 py310hb60b9b2_3 conda-forge pytest 7.2.0 pypi_0 pypi pytest-attrib 0.1.3 pyhd8ed1ab_0 conda-forge pytest-cov 4.0.0 pyhd8ed1ab_0 conda-forge pytest-xdist 3.0.2 pyhd8ed1ab_0 conda-forge python 3.10.6 ha86cf86_0_cpython conda-forge python-constraint 1.4.0 py_0 conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-fastjsonschema 2.16.2 pyhd8ed1ab_0 conda-forge python-utils 3.4.5 pyhd8ed1ab_0 conda-forge python_abi 3.10 2_cp310 conda-forge pytraj 2.0.6 pypi_0 pypi pytz 2022.6 pyhd8ed1ab_0 conda-forge pyyaml 6.0 pypi_0 pypi pyzmq 24.0.1 pypi_0 pypi qt 5.12.9 h1304e3e_6 conda-forge rdkit 2022.09.1 py310h10c98a6_0 conda-forge readline 8.2 h5eee18b_0 reportlab 3.5.68 pypi_0 pypi requests 2.28.1 pyhd8ed1ab_1 conda-forge rich 12.6.0 pyhd8ed1ab_0 conda-forge s3transfer 0.6.0 pyhd8ed1ab_0 conda-forge sander 22.0 pypi_0 pypi scikit-learn 1.1.3 pypi_0 pypi scipy 1.9.3 pypi_0 pypi seaborn 0.12.1 hd8ed1ab_0 conda-forge seaborn-base 0.12.1 pyhd8ed1ab_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 65.5.1 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge smirnoff99frosst 1.1.0 pyh44b312d_0 conda-forge snappy 1.1.9 hbd366e4_2 conda-forge sniffio 1.3.0 pyhd8ed1ab_0 conda-forge snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.3.2.post1 pyhd8ed1ab_0 conda-forge sphinx 5.3.0 pyhd8ed1ab_0 conda-forge sphinxcontrib-applehelp 1.0.2 py_0 conda-forge sphinxcontrib-devhelp 1.0.2 py_0 conda-forge sphinxcontrib-htmlhelp 2.0.0 pyhd8ed1ab_0 conda-forge sphinxcontrib-jsmath 1.0.1 py_0 conda-forge sphinxcontrib-qthelp 1.0.3 py_0 conda-forge sphinxcontrib-serializinghtml 1.1.5 pyhd8ed1ab_2 conda-forge sqlalchemy 1.4.43 pypi_0 pypi sqlite 3.39.4 h4ff8645_0 conda-forge stack_data 0.6.0 pyhd8ed1ab_0 conda-forge statsmodels 0.13.5 pypi_0 pypi tables 3.7.0 pypi_0 pypi tblib 1.7.0 pyhd8ed1ab_0 conda-forge tenacity 8.1.0 pyhd8ed1ab_0 conda-forge termcolor 2.1.0 pyhd8ed1ab_0 conda-forge terminado 0.17.0 pyh41d4057_0 conda-forge threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge tinydb 4.7.0 pyhd8ed1ab_0 conda-forge tk 8.6.12 h27826a3_0 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge toolz 0.12.0 pyhd8ed1ab_0 conda-forge tornado 6.1 pypi_0 pypi tqdm 4.64.1 pyhd8ed1ab_0 conda-forge traitlets 5.5.0 pyhd8ed1ab_0 conda-forge typing-extensions 4.4.0 hd8ed1ab_0 conda-forge typing_extensions 4.4.0 pyha770c72_0 conda-forge tzdata 2022f h191b570_0 conda-forge unicodedata2 15.0.0 pypi_0 pypi urllib3 1.26.12 pypi_0 pypi validators 0.18.2 pyhd3deb0d_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge websocket-client 1.4.2 pyhd8ed1ab_0 conda-forge wheel 0.38.2 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.6.1 pyha770c72_0 conda-forge xmltodict 0.13.0 pyhd8ed1ab_0 conda-forge xorg-kbproto 1.0.7 h14c3975_1002 conda-forge xorg-libice 1.0.10 h516909a_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h14c3975_0 conda-forge xorg-libxdmcp 1.1.3 h516909a_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-libxt 1.2.1 h7f98852_2 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h14c3975_1002 conda-forge xorg-xproto 7.0.31 h14c3975_1007 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zict 2.2.0 pyhd8ed1ab_0 conda-forge zipp 3.10.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 h166bdaf_4 conda-forge zstd 1.5.2 h6239696_4 conda-forge

Nov 14 '22 17:11 jchodera

I'm not sure it should return itself - if a molecule does not undergo tautomerization does it have tautomers?

Nov 14 '22 20:11 mattwthompson

It just has one tautomer: Itself.

Nov 14 '22 20:11 jchodera

Here's an example of how one might normally expect to use these methods:

for protomer in offmol.enumerate_protomers():
    for tautomer in protomer.enumerate_tautomers():
        for stereoisomer in tautomer.enumerate_stereoisomers():
            # process stereoisomer

If there is only one microstate for any of these enumeration methods (e.g. the molecule itself has only one protonation state, one tautomer, and/or one stereoisomer), it should clearly return an iterable containing only itself.

Nov 14 '22 20:11 jchodera

I will defer to @j-wags on what the behavior here should be, but I #1465 which would resolve this via a change in behavior.

Nov 15 '22 18:11 mattwthompson

This seems complicated because it will change the outputs of qcsubmit, which is an important artifact generator for our science team and is essential to the reproducibility of our work. Will we need a plan to coordinate changes with qcsubmit and qca-dataset-submission?

It will also require similar changes to enumerate_stereoisomers and enumerate_protomers.

enumerate_stereoisomers will be complicated - should it include the input if it had unassigned stereocenters? And is it OK if OE and RDKit disagree over what constitutes an unidentified stereocenter (#146)? Also what if the user requests rationalise=True but the input isn't rationalizable?

This seems like it would be a behavior change and expose a bunch of edge cases for little value.

Nov 21 '22 21:11 j-wags

Would an argument like return_self help here, where the default is the current default behaviour? iirc there were inconsistencies with the behaviour of enumerate_stereoisomers and enumerate_protomers, as detailed in https://github.com/openforcefield/openff-toolkit/issues/738.

Nov 21 '22 21:11 lilyminium

Thanks for pulling up that link, totally missed that there was a previous issue on this.

Adding return_self still doubles the number of possible input permutations that we need to handle, so that doesn't seem like an improvement from the "cost to make and maintain" side. And if we made it default to True then we'd still have the same "behavior change" problems as above, or if it was False users would continue complaining that the default behavior is surprising.

I'm going to tag this as medium priority, because Matt and I (and apparently Trevor and I) have put several hours into this and it's just getting more complex. Meanwhile there's lower-hanging, higher-value fruit elsewhere that needs fixing.

Nov 21 '22 21:11 j-wags

To turn this around, since it sounds like there isn't much appetite to make the API behave in an obvious and consistent manner or to document the current non-obvious implemented behavior:

If I want to enumerate all the microstates of a Molecule (protonation states, tautomers, and undefined stereochemistry [if any stereocenters/bonds are undefined]) using the current API, how would I do this?

Nov 28 '22 06:11 jchodera

I'm trying to piece this together from what QCSubmit does.

Molecule.enumerate_tautomers is called here:

        for molecule in molecules:
            try:
                tautomers = molecule.enumerate_tautomers(
                    max_states=self.max_tautomers, toolkit_registry=toolkit_registry
                )
                for tautomer in tautomers:
                    result.add_molecule(tautomer)
                # add the input molecule
                result.add_molecule(molecule=molecule)

            except Exception:
                result.filter_molecule(molecule)

which could be replaced with

for tautomer in molecule.enumerate_tautomers(
                    max_states=self.max_tautomers, toolkit_registry=toolkit_registry):
    result.add_molecule(molecule=tautomer)

with the proposed change in API to have enumerate_tautomers enumerate the list of reasonable tautomers.

Note also that this except Exception: is probably a very bad idea, since it could result in even minor unexpected issues (e.g. missing dependencies) silently filtering out molecules rather than passing on critical errors.

Molecule.enumerate_stereoisomers is called here and is much more confusing:

        for molecule in molecules:
            try:
                isomers = molecule.enumerate_stereoisomers(
                    undefined_only=self.undefined_only,
                    max_isomers=self.max_isomers,
                    rationalise=self.rationalise,
                    toolkit_registry=toolkit_registry,
                )

                # now check that each molecule is well defined
                for isomer in isomers:
                    if not check_missing_stereo(isomer):
                        result.add_molecule(isomer)

                # now check the input
                # rationalise if needed
                if self.rationalise:
                    molecule.generate_conformers(n_conformers=1)
                if not check_missing_stereo(molecule):
                    result.add_molecule(molecule)

            except Exception:
                result.filter_molecule(molecule)

If this behavior is needed to enumerate all undefined stereocenters/bonds, it would be sensible to incorporate it into Molecule.enumerate_stereoisomers since it will be difficult for others to follow that this is the idiom one would need to enumerate undefined stereocenters/bonds.

Finally, Molecule.enumerate_protomers is called here:

        for molecule in molecules:
            try:
                protomers = molecule.enumerate_protomers(max_states=self.max_states)

                for protomer in protomers:
                    result.add_molecule(protomer)
                result.add_molecule(molecule)

            except Exception:
                # if openeye is not available this will fail
                result.filter_molecule(molecule)

which is straightforward, but still is a mismatch for the API call---which seems to be "enumerate alternative protonation states that are not this one".

Since there is also no straightforward method to correctly enumerate all microstates of a molecule (e.g. Molecule.enumerate_microstates), this leads to potential issues if these enumeration schemes are also called in incorrect order. It appears that the default workflows in QCSubmit run afoul of this issue: In the openff.qcsubmit.state_enumeration workflow, these are called in this order:

    EnumerateProtomers,
    EnumerateStereoisomers,
    EnumerateTautomers,

Because the enumerate_tautomers step can change the number of protons on a nitrogen, this could create or delete stereocenters at nitrogens (e.g. on a piperazine ring) which might fail to be enumerated. My understanding is that the correct order should be

    EnumerateProtomers,
    EnumerateTautomers,
    EnumerateStereoisomers,

But it's easy to see why this could potentially cause confusion.

My recommendations would be:

Molecule.enumerate_protomers() should enumerate all reasonable protonation states of the molecule (including the input state)
Molecule.enumerate_tautomers() should enumerate all reasonable tautomeric states of the molecule (including the input state)
Molecule.enumerate_stereoisomers() should enumerate all uncertain/undefined stereocenters/bonds of the molecule (including the input state)
Molecule.enumerate_microstates() should call all of these in the correct order (protomers, tautomers, stereoisomers) to avoid confusion
Exceptions for each of these should be clearly documented in the docstrings so they can be properly handled.
QCSubmit should avoid except Exception which might drop microstates or molecules silently
QCSubmit should correct the enumeration order in its default workflow and/or offer the simpler EnumerateMicrostates that will do this correctly via Molecule.enumerate_microstates()

I totally understand there are higher-priority issues to address, of course, but it would be great to tidy this up while folks are still updating to the new API.

Nov 28 '22 07:11 jchodera

From what I remember and what @jchodera has posted above, QCSubmit also has to work around the issue of the input molecule not being returned by the method, so this change could slightly simplify it. What's more, if this would make the behaviour consistent with the backend toolkits this could help make usage easier.

On the other hand, this current behaviour is documented in the returns sections of each function maybe this should also be reflected in the top description?

It appears that the default workflows in QCSubmit run afoul of this issue

That's not quite right QCSubmit makes no assumptions about what the user might want to run and they are free to construct each workflow in any order they see fit so they could end up in this situation and I agree that a workflow component such a EnumerateMicrostates would be great! The ordering you refer to though is just alphabetical and provided by isort.

Dec 20 '22 09:12 jthorton

Just wondering if there's any interest in accepting a fix for this behavior. I'd be happy to help, since we use this functionality frequently!

Jul 02 '23 23:07 jchodera

Just wanted to check in whether there was any interest in accepting a fix for this behavior, since it has come up again recently.

Aug 29 '23 20:08 jchodera

Sure - Happy to accept a fix! Recommendations 1-3, 4, and 5 would probably be best as separate PRs to the OFF Toolkit, and 6+7 would be to QCSubmit. They seem separable so the PRs can be made in any order and won't block each other.

Aug 30 '23 03:08 j-wags

I'll take a stab at doing 1-3, since they seem tractable, and update here if I would like to pass something on to others.

Nov 28 '23 17:11 mattwthompson

openff-toolkit openff-toolkit copied to clipboard

Molecule.enumerate_tautomers() can return empty list

openff-toolkit
openff-toolkit copied to clipboard