openff-toolkit
openff-toolkit copied to clipboard
`from_smiles` ignores mapped smiles
Describe the bug
The Molecule.from_smiles()
method silently ignores any atomic indexing/mapping present in the SMILES string.
To Reproduce
from openff.toolkit.topology.molecule import Molecule
unmapped_mol_from_smiles = Molecule.from_smiles("HOH")
mol_from_smiles = Molecule.from_smiles("[H:1][O:3][H:2]")
mol_from_mapped_smiles = Molecule.from_mapped_smiles("[H:1][O:3][H:2]")
>>> unmapped_mol_from_smiles.atoms
[Atom(name=, atomic number=1),
Atom(name=, atomic number=8),
Atom(name=, atomic number=1)]
>>> mol_from_smiles.atoms
[Atom(name=, atomic number=1),
Atom(name=, atomic number=8),
Atom(name=, atomic number=1)]
>>> mol_from_mapped_smiles.atoms
[Atom(name=, atomic number=1),
Atom(name=, atomic number=1),
Atom(name=, atomic number=8)]
Computing environment (please complete the following information):
- Arch Linux
Output of `conda list`
# Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge alabaster 0.7.12 py_0 conda-forge amberlite 16.0 pypi_0 pypi ambertools 21.11 py39hc630cb1_0 conda-forge amberutils 21.0 pypi_0 pypi amqp 5.0.9 pyhd8ed1ab_0 conda-forge anyio 3.5.0 py39hf3d152e_0 conda-forge argcomplete 2.0.0 pyhd8ed1ab_0 conda-forge argon2-cffi 21.3.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py39h3811e60_1 conda-forge arpack 3.7.0 hdefa2d7_2 conda-forge arrow-cpp 6.0.1 py39h2c67c1e_5_cpu conda-forge asgiref 3.5.0 pyhd8ed1ab_0 conda-forge asttokens 2.0.5 pyhd8ed1ab_0 conda-forge astunparse 1.6.3 pyhd8ed1ab_0 conda-forge attrs 21.4.0 pyhd8ed1ab_0 conda-forge autodoc-pydantic 1.6.1 pyhd8ed1ab_0 conda-forge aws-c-auth 0.6.8 hadad3cd_1 conda-forge aws-c-cal 0.5.12 h70efedd_7 conda-forge aws-c-common 0.6.17 h7f98852_0 conda-forge aws-c-compression 0.2.14 h7c7754b_7 conda-forge aws-c-event-stream 0.2.7 hd2be095_32 conda-forge aws-c-http 0.6.10 h416565a_3 conda-forge aws-c-io 0.10.14 he836878_0 conda-forge aws-c-mqtt 0.7.10 h885097b_0 conda-forge aws-c-s3 0.1.29 h8d70ed6_0 conda-forge aws-c-sdkutils 0.1.1 h7c7754b_4 conda-forge aws-checksums 0.1.12 h7c7754b_6 conda-forge aws-crt-cpp 0.17.10 h6ab17b9_5 conda-forge aws-sdk-cpp 1.9.160 h36ff4c5_0 conda-forge babel 2.9.1 pyh44b312d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge basis_set_exchange 0.9 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.10.0 pypi_0 pypi billiard 3.6.4.0 py39h3811e60_1 conda-forge black 21.12b0 pyhd8ed1ab_0 conda-forge bleach 4.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.0 h9c3ff4c_0 conda-forge boost 1.74.0 py39h5472131_4 conda-forge boost-cpp 1.74.0 h359cf19_5 conda-forge brotli 1.0.9 h7f98852_6 conda-forge brotli-bin 1.0.9 h7f98852_6 conda-forge brotlipy 0.7.0 py39h3811e60_1003 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.0.0 pyhd8ed1ab_0 conda-forge cairo 1.16.0 ha00ac49_1009 conda-forge celery 5.1.2 pyhd8ed1ab_0 conda-forge certifi 2021.10.8 py39hf3d152e_1 conda-forge cffi 1.15.0 py39h4bc2ebd_0 conda-forge charset-normalizer 2.0.10 pyhd8ed1ab_0 conda-forge chemper 1.0.0 pyh9f0ad1d_0 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge click-didyoumean 0.3.0 pyhd8ed1ab_0 conda-forge click-option-group 0.5.3 pyhd8ed1ab_0 conda-forge click-plugins 1.1.1 py_0 conda-forge click-repl 0.2.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge commonmark 0.9.1 py_0 conda-forge cryptography 36.0.1 py39h95dcef6_0 conda-forge css-html-js-minify 2.5.5 pypi_0 pypi cudatoolkit 11.6.0 habf752d_9 conda-forge curl 7.81.0 h2574ce0_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge cython 0.29.26 py39he80948d_0 conda-forge dataclasses 0.8 pyhc8e2a94_3 conda-forge debugpy 1.5.1 py39he80948d_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge deprecated 1.2.13 pyh6c4a22f_0 conda-forge doctest-oxide 0+untagged.11.gc938329 pypi_0 pypi docutils 0.17.1 py39hf3d152e_1 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge executing 0.8.2 pyhd8ed1ab_0 conda-forge fastapi 0.67.0 pyhd8ed1ab_0 conda-forge fftw 3.3.10 nompi_h77c792f_102 conda-forge flit-core 3.6.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.28.5 py39h3811e60_0 conda-forge forcebalance 1.9.2 py39h5472131_1 conda-forge freetype 2.10.4 h0708190_1 conda-forge future 0.18.2 py39hf3d152e_4 conda-forge geometric 0.9.7.2 py_0 conda-forge gettext 0.19.8.1 h73d1719_1008 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge glog 0.5.0 h48cff8f_0 conda-forge greenlet 1.1.2 py39he80948d_1 conda-forge grpc-cpp 1.42.0 ha1441d3_1 conda-forge h11 0.12.0 pyhd8ed1ab_0 conda-forge h2 4.1.0 py39hf3d152e_0 conda-forge h5py 3.6.0 nompi_py39h7e08c79_100 conda-forge hdf4 4.2.15 h10796ff_3 conda-forge hdf5 1.12.1 nompi_h2750804_103 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge httpcore 0.14.5 pyhd8ed1ab_0 conda-forge httpx 0.22.0 py39hf3d152e_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 69.1 h9c3ff4c_0 conda-forge idna 3.3 pyhd8ed1ab_0 conda-forge imagesize 1.3.0 pyhd8ed1ab_0 conda-forge importlib-metadata 4.10.1 py39hf3d152e_0 conda-forge importlib_metadata 4.10.1 hd8ed1ab_0 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge iniconfig 1.1.1 pyh9f0ad1d_0 conda-forge ipykernel 6.7.0 py39hef51801_0 conda-forge ipython 8.0.1 py39hf3d152e_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py39hf3d152e_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge jsonschema 4.4.0 pyhd8ed1ab_0 conda-forge jupyter_client 7.1.2 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.1 py39hf3d152e_1 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge kiwisolver 1.3.2 py39h1a9c180_1 conda-forge kombu 5.2.3 py39hf3d152e_0 conda-forge krb5 1.19.2 hcc1bbae_3 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libblas 3.9.0 13_linux64_openblas conda-forge libbrotlicommon 1.0.9 h7f98852_6 conda-forge libbrotlidec 1.0.9 h7f98852_6 conda-forge libbrotlienc 1.0.9 h7f98852_6 conda-forge libcblas 3.9.0 13_linux64_openblas conda-forge libcurl 7.81.0 h2574ce0_0 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 11.2.0 h1d223b6_12 conda-forge libgfortran-ng 11.2.0 h69a702a_12 conda-forge libgfortran5 11.2.0 h5c6108e_12 conda-forge libglib 2.70.2 h174f98d_1 conda-forge libgomp 11.2.0 h1d223b6_12 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 13_linux64_openblas conda-forge libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libprotobuf 3.19.3 h780b84a_0 conda-forge libsass 0.21.0 pypi_0 pypi libsodium 1.0.18 h36c2ea0_1 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_12 conda-forge libthrift 0.15.0 he6d91bd_1 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libutf8proc 2.7.0 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.2 h7f98852_1 conda-forge libxcb 1.13 h7f98852_1004 conda-forge libxml2 2.9.12 h885dcf4_1 conda-forge libxslt 1.1.33 h0ef7038_3 conda-forge libzip 1.8.0 h4de3113_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge lxml 4.7.1 py39h107f48f_0 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge lzo 2.10 h516909a_1000 conda-forge markdown-it-py 2.0.1 pyhd8ed1ab_0 conda-forge markupsafe 2.0.1 py39h3811e60_1 conda-forge matplotlib-base 3.5.1 py39h2fa2bec_0 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mdit-py-plugins 0.3.0 pyhd8ed1ab_0 conda-forge mdtraj 1.9.7 py39h138c130_1 conda-forge mdurl 0.1.0 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py39h3811e60_1005 conda-forge mmpbsa-py 16.0 pypi_0 pypi mock 4.0.3 py39hf3d152e_2 conda-forge msgpack-python 1.0.3 py39h1a9c180_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mypy_extensions 0.4.3 py39hf3d152e_4 conda-forge myst-parser 0.16.1 pyhd8ed1ab_0 conda-forge nbclient 0.5.10 pyhd8ed1ab_1 conda-forge nbconvert 6.4.0 py39hf3d152e_0 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge ncurses 6.3 h9c3ff4c_0 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge netcdf-fortran 4.5.4 nompi_h2b6e579_100 conda-forge networkx 2.6.3 pyhd8ed1ab_1 conda-forge nglview 3.0.3 pyh8a188c0_0 conda-forge nomkl 1.0 h5ca1d4c_0 conda-forge notebook 6.4.7 pyha770c72_0 conda-forge numexpr 2.8.0 py39hbd72853_100 conda-forge numpy 1.22.1 py39h91f2184_0 conda-forge ocl-icd 2.3.1 h7f98852_0 conda-forge ocl-icd-system 1.0.0 1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openeye-toolkits 2021.2.0 py39_0 openeye openff-bespokefit 27rc4904+72.g60dd457 dev_0openff-forcefields 2.0.0 pyh6c4a22f_0 conda-forge openff-fragmenter-base 0.1.2 pyhd8ed1ab_0 conda-forge openff-qcsubmit 0.3.0 pyhd8ed1ab_0 conda-forge openff-sphinx-theme 0.0.32+115.gcb16b0d pypi_0 pypi openff-toolkit-base 0.10.2 pyhd8ed1ab_0 conda-forge openff-utilities 0.1.1 pyh6c4a22f_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openmm 7.7.0 py39h792354b_0 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.7.1 h1be678f_1 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge packmol 20.010 h86c2bf4_0 conda-forge packmol-memgen 1.2.1rc0 pypi_0 pypi pandas 1.4.0 py39hde0f152_0 conda-forge pandoc 2.17.0.1 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parmed 3.4.3 py39he80948d_1 conda-forge parquet-cpp 1.5.1 2 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge pathspec 0.9.0 pyhd8ed1ab_0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pdb4amber 20.1 pypi_0 pypi perl 5.32.1 1_h7f98852_perl5 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.4.0 py39ha612740_0 conda-forge pint 0.18 pyhd8ed1ab_0 conda-forge pip 22.0.2 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge platformdirs 2.3.0 pyhd8ed1ab_0 conda-forge plotly 5.5.0 pyhd8ed1ab_0 conda-forge pluggy 1.0.0 py39hf3d152e_2 conda-forge prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.24 pyha770c72_0 conda-forge prompt_toolkit 3.0.24 hd8ed1ab_0 conda-forge psutil 5.9.0 py39h3811e60_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge py 1.11.0 pyh6c4a22f_0 conda-forge py-cpuinfo 8.0.0 pyhd8ed1ab_0 conda-forge pyarrow 6.0.1 py39hff6fa39_5_cpu conda-forge pycairo 1.20.1 py39hedcb9fc_1 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.9.0 py39h3811e60_0 conda-forge pygments 2.11.2 pyhd8ed1ab_0 conda-forge pymbar 3.0.5 py39hce5d2b2_3 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.7 pyhd8ed1ab_0 conda-forge pyrsistent 0.18.1 py39h3811e60_0 conda-forge pysocks 1.7.1 py39hf3d152e_4 conda-forge pytables 3.6.1 py39h2669a42_5 conda-forge pytest 6.2.5 py39hf3d152e_2 conda-forge python 3.9.10 h85951f9_1_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-slugify 5.0.2 pypi_0 pypi python_abi 3.9 2_cp39 conda-forge pytraj 2.0.6 pypi_0 pypi pytz 2021.3 pyhd8ed1ab_0 conda-forge pyyaml 6.0 py39h3811e60_3 conda-forge pyzmq 22.3.0 py39h37b5a0c_1 conda-forge qcelemental 0.24.0 pyhd8ed1ab_0 conda-forge qcengine 0.22.0 pyhd8ed1ab_0 conda-forge qcportal 0.15.7 pyhd8ed1ab_0 conda-forge rdkit 2021.09.4 py39hccf6a74_0 conda-forge re2 2021.11.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge redis-py 4.1.2 pyhd8ed1ab_0 conda-forge redis-server 6.2.6 h9b69904_0 conda-forge regex 2022.1.18 py39h3811e60_0 conda-forge reportlab 3.5.68 py39he59360d_1 conda-forge requests 2.27.1 pyhd8ed1ab_0 conda-forge rfc3986 1.5.0 pyhd8ed1ab_0 conda-forge rich 11.1.0 pyhd8ed1ab_0 conda-forge s2n 1.3.0 h9b69904_0 conda-forge sander 16.0 pypi_0 pypi scipy 1.7.3 py39hee8e79c_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 60.5.0 py39hf3d152e_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge smirnoff99frosst 1.1.0 pyh44b312d_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sniffio 1.2.0 py39hf3d152e_2 conda-forge snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge soupsieve 2.3.1 pypi_0 pypi sphinx 4.4.0 pyh6c4a22f_1 conda-forge sphinx-click 3.0.3 pyhd8ed1ab_0 conda-forge sphinxcontrib-applehelp 1.0.2 py_0 conda-forge sphinxcontrib-devhelp 1.0.2 py_0 conda-forge sphinxcontrib-htmlhelp 2.0.0 pyhd8ed1ab_0 conda-forge sphinxcontrib-jsmath 1.0.1 py_0 conda-forge sphinxcontrib-qthelp 1.0.3 py_0 conda-forge sphinxcontrib-serializinghtml 1.1.5 pyhd8ed1ab_1 conda-forge sqlalchemy 1.4.31 py39h3811e60_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge stack_data 0.1.4 pyhd8ed1ab_0 conda-forge starlette 0.14.2 pyhd8ed1ab_0 conda-forge tenacity 8.0.1 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py39hf3d152e_1 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge text-unidecode 1.3 pypi_0 pypi tk 8.6.11 h27826a3_1 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 1.2.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py39h3811e60_2 conda-forge torsiondrive 1.1.0 pyhd8ed1ab_0 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge typed-ast 1.5.2 py39h3811e60_0 conda-forge typing-extensions 4.0.1 hd8ed1ab_0 conda-forge typing_extensions 4.0.1 pyha770c72_0 conda-forge tzdata 2021e he74cb21_0 conda-forge unidecode 1.3.2 pypi_0 pypi urllib3 1.26.8 pyhd8ed1ab_1 conda-forge uvicorn 0.17.1 py39hf3d152e_0 conda-forge vine 5.0.0 pyhd8ed1ab_1 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.37.1 pyhd8ed1ab_0 conda-forge widgetsnbextension 3.5.2 py39hf3d152e_1 conda-forge wrapt 1.13.3 py39h3811e60_1 conda-forge xmltodict 0.12.0 py_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-libxt 1.2.1 h7f98852_2 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xtb 6.4.1 hf06ca72_2 conda-forge xtb-python 20.2 py39h3811e60_4 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zipp 3.7.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.2 ha95c52a_0 conda-forge
Additional context
from_mapped_smiles()
exists, but my impression is that it's there to require a mapping (and raise an error if its incomplete). I think from_smiles()
should make a best effort attempt to get any mapped atoms in the right spot. I'm not sure its obvious that from_smiles
won't do this.
Also, BespokeFit (and possibly other downstream projects) allows users to specify a molecule with SMILES. I can imagine users of BespokeFit wanting to specify exact atom ordering, and being surprised when their mapped smiles neither raises an error nor reorders the atoms.
Ah, I think this is a documentation(/"it does what Jeff thinks it should do, he just didn't tell anyone what that is") problem. It's hard to specify "correct" behavior here because atom indices in SMILES don't have a universal meaning. Molecule.from_smiles
actually DOES read and store the map indices, it just considers them cosmetic.
from openff.toolkit.topology.molecule import Molecule
mol_from_smiles = Molecule.from_smiles("[H:1][O:3][H:2]")
print(mol_from_smiles.properties['atom_map'])
{0: 1, 1: 3, 2: 2}
This distinction in the API keeps us accepting of all input without needing to deal with a zillion corner cases (what would we do with the perfectly valid SMILES [H:1][H:1]
or [H:5][H]
?)
Do you have a sense for whether it'd be better to handle this on a docs or a programming level?
(possibly related: I kinda tried to figure out the "common" meaning of map indices in #522 but ran out of organizational bandwidth to carry it forward. )
Another user encountered this recently (https://github.com/openmm/spice-dataset/issues/16), @Yoshanuikabundi do you think we can make it more explicit in the documentation.
Another random thought, does adding an extra argument mapped=True
, similar to to_smiles(mapped=True)
, and pointing it to from_mapped_smiles()
internally work as an alternative?
I've also been bit by this - not resulting in broken production, but resulting in me looking goofy passing mapped SMILES to Molecule.from_smiles
. I didn't know from_mapped_smiles
exists and wouldn't have come to know it if Jeff didn't catch my error and point me to it - it would have been a silent error forever if I was an independent downstream developer or scientist. I don't think the current implementation addresses this - both because the stored atom maps are separated from the ordering of the atoms in the molecule and because I wouldn't have known they were there. (In all honesty I might not have pulled up the docs, but this behavior is not there/not visible.) Being a silent error/gotcha, I think a behavior change is justified alongside documentation changes.
I'd be naively in favor of any of the following solutions:
-
Molecule.from_smiles
checks to see if the SMILES is mapped as delegates to the the current functionality ofMolecule.from_mapped_files
, erroring out if it runs into confusing corner cases -
Molecule.from_smiles
checks to see if the SMILES is mapped as delegates to the the current functionality ofMolecule.from_mapped_files
, bailing out and processing it as un-mapped SMILES with a warning if something goes wrong while processing it as a mapped SMILES -
Molecule.from_smiles
acceptsmapped=True
(defaultFalse
) as a named argument, consolidating behavior in one method in a way that's similar toMolecule.to_smiles
Ran into this footgun again - okay sure, I should know better, but also I don't think silently ignoring or transforming information is how we should be treating inputs.
ipdb> Molecule.from_smiles("[H:2][O:1][H:3]").atoms
[Atom(name=, atomic number=1), Atom(name=, atomic number=8), Atom(name=, atomic number=1)]
ipdb> Molecule.from_smiles("[H:1][O:3][H:2]").atoms
[Atom(name=, atomic number=1), Atom(name=, atomic number=8), Atom(name=, atomic number=1)]
If the concern is handling ambiguous or ill-defined cases, why can't we just error out? I think that's what error handling is for
If the concern is handling ambiguous or ill-defined cases, why can't we just error out?
It's quite common to have fully mapped SMILES, where the map indices mean something other than "put the atoms in this order in a connection table" so raising an error would go too far. This does seem to be a really frequent footgun, though, even for experienced users. Would adding a warning be a good solution?
A warning would be great - until #522 is implemented I think the toolkit should be loud about what it does and does not handle
Cool - I'll accept a PR that emits a warning when a fully mapped SMILES is passed to from_smiles
!
I've been bitten by this so many times! Thank you!
#1474 and should land in 0.14.5 or 0.15.0