mdtraj icon indicating copy to clipboard operation
mdtraj copied to clipboard

NETCDF4 Error

Open jeremyleung521 opened this issue 1 year ago • 10 comments

If I have netCDF4 installed, I get the following error with mdtraj v1.10.0 when I try to load an .nc file. Probably related to the changes in #1878 . The nc file is created from a pdb file with mdtraj. This works fine in v1.9.9 (when I downgraded).

@janash

----> 1 a = mdtraj.load('ntl9.nc', top='ntl9.pdb')

File ~/miniconda3/envs/westpa-workshop2024/lib/python3.10/site-packages/mdtraj/core/trajectory.py:433, in load(filename_or_filenames, discard_overlapping_frames, **kwargs)
    423 filename_or_filenames = filename_or_filenames[1:]  # ignore first file
    424 try:
    425     # this is a little hack that makes calling load() more predictable. since
    426     # most of the loaders take a kwargs "top" except for load_hdf5, (since
   (...)
    431     # TODO make all the loaders accept a pre parsed topology (top) in order to avoid
    432     # this part and have a more consistent interface and a faster load function
--> 433     t = loader(tmp_file, **kwargs)
    435 except TypeError as e:
    436     # Don't want to intercept legit
    437     # TypeErrors
    438     if "got an unexpected keyword argument 'top'" not in str(e):

File ~/miniconda3/envs/westpa-workshop2024/lib/python3.10/site-packages/mdtraj/formats/netcdf.py:91, in load_netcdf(filename, top, stride, atom_indices, frame)
     88 topology = _parse_topology(top)
     89 atom_indices = cast_indices(atom_indices)
---> 91 with NetCDFTrajectoryFile(filename) as f:
     92     if frame is not None:
     93         f.seek(frame)

File ~/miniconda3/envs/westpa-workshop2024/lib/python3.10/site-packages/mdtraj/formats/netcdf.py:169, in NetCDFTrajectoryFile.__init__(self, filename, mode, force_overwrite)
    166 if mode == "w" and not force_overwrite and os.path.exists(filename):
    167     raise OSError('"%s" already exists' % filename)
--> 169 self._handle = netcdf(filename, mode=mode, **input_args)
    170 self._closed = False
    172 # self._frame_index is the current frame that we're at in the
    173 #     file
    174 # self._needs_initialization indicates whether we need to set the
    175 #     global properties of the file. This is required before the first
    176 #     write operation on a new file

TypeError: netcdf_file.__init__() got an unexpected keyword argument 'format'

jeremyleung521 avatar Jun 03 '24 16:06 jeremyleung521

After digging through it, it must be some shoddy merging because this line, which is no longer in the branch, resurfaced, which is causing all sorts of error.

https://github.com/mdtraj/mdtraj/blob/master/mdtraj/formats/netcdf.py#L161

jeremyleung521 avatar Jun 03 '24 16:06 jeremyleung521

Plausible, there were lots of moving parts around then - do you might removing that line and seeing if your file gets loaded as expected? (Should this have been caught by a test?)

mattwthompson avatar Jun 03 '24 18:06 mattwthompson

Argh that's annoying - sounds like it's quite literally a single line fix and deleting it would fix things?

Do you have a test case/code to try and verify fail/success on? Or is a = mdtraj.load('ntl9.nc', top='ntl9.pdb') sufficient to verify? If it's a single line fix we can just try patching it out

sukritsingh avatar Jun 03 '24 18:06 sukritsingh

Plausible, there were lots of moving parts around then - do you might removing that line and seeing if your file gets loaded as expected? (Should this have been caught by a test?)

Without netCDF4 installed it wouldn't error out. I'm assuming the CI env file does not have netCDF4 installed.

Argh that's annoying - sounds like it's quite literally a single line fix and deleting it would fix things?

Do you have a test case/code to try and verify fail/success on? Or is a = mdtraj.load('ntl9.nc', top='ntl9.pdb') sufficient to verify? If it's a single line fix we can just try patching it out

Seems like a single line (and with netcdf4 installed) is sufficient to test it. And removing that line fixes it. It's essentially prepping the arguments for netcdf4, but then that extra line tells mdtraj to go back to scipy, which lead to the error.

jeremyleung521 avatar Jun 03 '24 18:06 jeremyleung521

(Should this have been caught by a test?)

It looks like test_netcdf.py has some tests that evaluate md.load() implicitly but nothing is directly testing md.load()....Maybe that's also an addition we make in the fix? Both remove that line and add that additional md.load() test?

sukritsingh avatar Jun 03 '24 18:06 sukritsingh

I'm assuming the CI env file does not have netCDF4 installed.

I think if our recommendation is to use this, it should be installed in CI

mattwthompson avatar Jun 03 '24 18:06 mattwthompson

Without netCDF4 installed it wouldn't error out. I'm assuming the CI env file does not have netCDF4 installed.

We should list netCDF4 as a requirement for the environment then, no? It's a requirement clearly for this to function....

Seems like a single line (and with netcdf4 installed) is sufficient to test it. And removing that line fixes it. It's essentially prepping the arguments for netcdf4, but then that extra line tells mdtraj to go back to scipy, which lead to the error.

Well that's good to hear - at the very least it sounds like a quick fix of removing the line

Should we have installing netCDF4 in CI + adding a test for md.load() in a separate PR?

sukritsingh avatar Jun 03 '24 18:06 sukritsingh

Without netCDF4 installed it wouldn't error out. I'm assuming the CI env file does not have netCDF4 installed.

We should list netCDF4 as a requirement for the environment then, no? It's a requirement clearly for this to function....

Seems like a single line (and with netcdf4 installed) is sufficient to test it. And removing that line fixes it. It's essentially prepping the arguments for netcdf4, but then that extra line tells mdtraj to go back to scipy, which lead to the error.

Well that's good to hear - at the very least it sounds like a quick fix of removing the line

Should we have installing netCDF4 in CI + adding a test for md.load() in a separate PR?

The md.load() test already exists and I think the NetCDFTrajectoryFile object should already test everything as long as netcdf4 is installed. Maybe a separate environment (or some monkey patching to mask netcdf4) should exist to test the scipy portion.

jeremyleung521 avatar Jun 03 '24 18:06 jeremyleung521

Maybe a separate environment (or some monkey patching to mask netcdf4) should exist to test the scipy portion.

This sounds like a dangerous idea but something worth discussing lol. Is this something folks use? I confess I barely use .nc files as is anyways.

In the meantime I've opened up a PR at #1894 to address this single-line-fix plus adding netCDF4 to the CI!

sukritsingh avatar Jun 03 '24 18:06 sukritsingh

Hi all, I'm at a workshop today, so I've just seen the emails come in! I'm glad we've found a solution has been found to the problem.

Concerning testing, this may be a good time to use mocking to simulate dependency absences - https://docs.python.org/3/library/unittest.mock.html

janash avatar Jun 03 '24 19:06 janash

This is fixed in 1.10.1. Closing this.

jeremyleung521 avatar Nov 11 '24 18:11 jeremyleung521