openff-toolkit
openff-toolkit copied to clipboard
`from_file` methods should be able to take pathlib paths
Is your feature request related to a problem? Please describe. Trying to load a molecule with the following code:
from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(ligand_path)
Gives a cryptic error message
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-24-8869d92f3423> in <module>
1 ligand_path = Path('chembl_1078774.sdf')
----> 2 ligand = Molecule.from_file(ligand_path)
~/Documents/openff/openff-toolkit/openff/toolkit/topology/molecule.py in from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
4165 if not (isinstance(file_path, str)):
4166 raise Exception(
-> 4167 "If providing a file-like object for reading molecules, the format must be specified"
4168 )
4169 # Assume that files ending in ".gz" should use their second-to-last suffix for compatibility check
Exception: If providing a file-like object for reading molecules, the format must be specified
pathlib.Path is a convenient wrapper type from the standard library for dealing with paths. Anywhere in the standard library that accepts a str as a file path that does not accept a Path is considered a bug.
Figuring out the correct solution is quite difficult, as the error message is basically irrelevant - Paths are not file-like objects. The fix suggested by the error message does not work, and gives another confusing error message:
from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(ligand_path, file_format="SDF")
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-27-0f0fb3a6f68a> in <module>
1 from pathlib import Path
2 ligand_path = Path('chembl_1078774.sdf')
----> 3 ligand = Molecule.from_file(ligand_path, file_format="SDF")
~/Documents/openff/openff-toolkit/openff/toolkit/topology/molecule.py in from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
4249
4250 if len(mols) == 0:
-> 4251 raise Exception("Unable to read molecule from file: {}".format(file_path))
4252 elif len(mols) == 1:
4253 return mols[0]
Exception: Unable to read molecule from file: chembl_1078774.sdf
Describe the solution you'd like
Molecule.from_file and similar methods should be able to take Path objects.
Describe alternatives you've considered It's not super difficult for the end user to fix this, if they are familiar with this kind of bug and correctly ignore the error message:
from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(str(ligand_path))
I'm in support of this sort of change (#740). I think the last time I tried to implement this, I was somewhat intimidated but the number of places in the codebase this would need to be implemented, so I didn't get very far.
I think this is done
In [6]: Molecule("tmp.sdf")
Out[6]: NGLWidget()
In [7]: Molecule(pathlib.Path("tmp.sdf"))
Out[7]: NGLWidget()