openff-toolkit icon indicating copy to clipboard operation
openff-toolkit copied to clipboard

`from_file` methods should be able to take pathlib paths

Open Yoshanuikabundi opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. Trying to load a molecule with the following code:

from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(ligand_path)

Gives a cryptic error message

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-24-8869d92f3423> in <module>
      1 ligand_path = Path('chembl_1078774.sdf')
----> 2 ligand = Molecule.from_file(ligand_path)

~/Documents/openff/openff-toolkit/openff/toolkit/topology/molecule.py in from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
   4165             if not (isinstance(file_path, str)):
   4166                 raise Exception(
-> 4167                     "If providing a file-like object for reading molecules, the format must be specified"
   4168                 )
   4169             # Assume that files ending in ".gz" should use their second-to-last suffix for compatibility check

Exception: If providing a file-like object for reading molecules, the format must be specified

pathlib.Path is a convenient wrapper type from the standard library for dealing with paths. Anywhere in the standard library that accepts a str as a file path that does not accept a Path is considered a bug.

Figuring out the correct solution is quite difficult, as the error message is basically irrelevant - Paths are not file-like objects. The fix suggested by the error message does not work, and gives another confusing error message:

from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(ligand_path, file_format="SDF")
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-27-0f0fb3a6f68a> in <module>
      1 from pathlib import Path
      2 ligand_path = Path('chembl_1078774.sdf')
----> 3 ligand = Molecule.from_file(ligand_path, file_format="SDF")

~/Documents/openff/openff-toolkit/openff/toolkit/topology/molecule.py in from_file(cls, file_path, file_format, toolkit_registry, allow_undefined_stereo)
   4249 
   4250         if len(mols) == 0:
-> 4251             raise Exception("Unable to read molecule from file: {}".format(file_path))
   4252         elif len(mols) == 1:
   4253             return mols[0]

Exception: Unable to read molecule from file: chembl_1078774.sdf

Describe the solution you'd like Molecule.from_file and similar methods should be able to take Path objects.

Describe alternatives you've considered It's not super difficult for the end user to fix this, if they are familiar with this kind of bug and correctly ignore the error message:

from pathlib import Path
ligand_path = Path('chembl_1078774.sdf')
ligand = Molecule.from_file(str(ligand_path))

Yoshanuikabundi avatar May 19 '21 08:05 Yoshanuikabundi

I'm in support of this sort of change (#740). I think the last time I tried to implement this, I was somewhat intimidated but the number of places in the codebase this would need to be implemented, so I didn't get very far.

mattwthompson avatar May 19 '21 12:05 mattwthompson

I think this is done

In [6]: Molecule("tmp.sdf")
Out[6]: NGLWidget()

In [7]: Molecule(pathlib.Path("tmp.sdf"))
Out[7]: NGLWidget()

mattwthompson avatar Oct 13 '23 14:10 mattwthompson