mdanalysis icon indicating copy to clipboard operation
mdanalysis copied to clipboard

Cannot load LAMMPS NetCDF trajectory files due to "header issue" in scipy.io.netcdf_file

Open bbye98 opened this issue 1 year ago • 5 comments

Expected behavior

I have run a LAMMPS simulation of a simple electrolyte–solvent systems confined between two planar dielectric electrodes. During the simulation, the topology is written to LAMMPS dump file cg_lj_coul_reduced_lammps.topology and the trajectory is written to a NetCDF file cg_lj_coul_reduced_lammps.nc.

I am attempting to load both files to a MDAnalysis.Universe object using

Python 3 code
import os

import MDAnalysis as mda

os.chdir("/mnt/e/research/test/poisson_potential/data")
fname = "cg_lj_coul_reduced_lammps"
universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA", 
                        atom_style="id type q x y z vx vy vz")

Actual behavior

Instead, I get the following ValueError stemming from a header check in scipy.io.netcdf_file:

Python 3 stack trace
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[35], line 7
      5 os.chdir("/mnt/e/research/test/poisson_potential/data")
      6 fname = "cg_lj_coul_reduced_lammps"
----> 7 universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA", 
      8                         atom_style="id type q x y z vx vy vz")

File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:375, in Universe.__init__(self, topology, all_coordinates, format, topology_format, transformations, guess_bonds, vdwradii, fudge_factor, lower_bound, in_memory, in_memory_step, *coordinates, **kwargs)
    370 coordinates = _resolve_coordinates(self.filename, *coordinates,
    371                                    format=format,
    372                                    all_coordinates=all_coordinates)
    374 if coordinates:
--> 375     self.load_new(coordinates, format=format, in_memory=in_memory,
    376                 in_memory_step=in_memory_step, **kwargs)
    378 if transformations:
    379     if callable(transformations):

File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:580, in Universe.load_new(self, filename, format, in_memory, in_memory_step, **kwargs)
    577 # supply number of atoms for readers that cannot do it for themselves
    578 kwargs['n_atoms'] = self.atoms.n_atoms
--> 580 self.trajectory = reader(filename, format=format, **kwargs)
    581 if self.trajectory.n_atoms != len(self.atoms):
    582     raise ValueError("The topology and {form} trajectory files don't"
    583                      " have the same number of atoms!\n"
    584                      "Topology number of atoms {top_n_atoms}\n"
   (...)
    588                          fname=filename,
    589                          trj_n_atoms=self.trajectory.n_atoms))

File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/lib/util.py:2553, in store_init_arguments.<locals>.wrapper(self, *args, **kwargs)
   2551             else:
   2552                 self._kwargs[key] = arg
-> 2553 return func(self, *args, **kwargs)

File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py:446, in NCDFReader.__init__(self, filename, n_atoms, mmap, **kwargs)
    443 super(NCDFReader, self).__init__(filename, **kwargs)
    445 # ensure maskandscale is off so we don't end up double scaling
--> 446 self.trjfile = NCDFPicklable(self.filename,
    447                              mmap=self._mmap,
    448                              maskandscale=False)
    450 # AMBER NetCDF files should always have a convention
    451 try:

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale)
    276 self._attributes = {}
    278 if mode in 'ra':
--> 279     self._read()

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self)
    608 # Read file headers and set data.
    609 self._read_numrecs()
--> 610 self._read_dim_array()
    611 self._read_gatt_array()
    612 self._read_var_array()

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self)
    618 header = self.fp.read(4)
    619 if header not in [ZERO, NC_DIMENSION]:
--> 620     raise ValueError("Unexpected header.")
    621 count = self._unpack_int()
    623 for dim in range(count):

ValueError: Unexpected header.

I get the same error when I try loading the trajectory file directly using scipy.io.netcdf_file, but not using netCDF4.Dataset:

Python 3 code
from netCDF4 import Dataset
from scipy.io import netcdf_file

print(Dataset(f"{fname}.nc", "r"))
print(netcdf_file(f"{fname}.nc", "r"))
Python 3 output + stack trace
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_DATA data model, file format NETCDF3):
    Conventions: AMBER
    ConventionVersion: 1.0
    program: LAMMPS
    programVersion: 3 Nov 2022
    dimensions(sizes): frame(1101), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3)
    variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial)
    groups: 

---------------------------------------------------------------------------
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[39], line 5
      2 from scipy.io import netcdf_file
      4 print(Dataset(f"{fname}.nc", "r"))
----> 5 print(netcdf_file(f"{fname}.nc", "r"))

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale)
    276 self._attributes = {}
    278 if mode in 'ra':
--> 279     self._read()

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self)
    608 # Read file headers and set data.
    609 self._read_numrecs()
--> 610 self._read_dim_array()
    611 self._read_gatt_array()
    612 self._read_var_array()

File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self)
    618 header = self.fp.read(4)
    619 if header not in [ZERO, NC_DIMENSION]:
--> 620     raise ValueError("Unexpected header.")
    621 count = self._unpack_int()
    623 for dim in range(count):

ValueError: Unexpected header.

Code to reproduce the behavior

This is the LAMMPS script used to generate the topology and trajectory files:

LAMMPS script
# ==================================================================== #
# PARAMETERS AND SETTINGS                                              #
# ==================================================================== #

shell           cd /mnt/e/research/test/poisson_potential/data
log             cg_lj_coul_reduced_lammps.log

variable        N equal 1000
variable        T_reduced equal 1
variable        mass_reduced equal 1
variable        epsilon_reduced equal 1
variable        sigma_reduced equal 1
variable        x equal 0.5
variable        N_ion equal v_x*v_N/2
variable        q_reduced equal 13.625994230719545
variable        varepsilon_r equal 78

variable        L_x_reduced equal 8
variable        L_y_reduced equal 7.794228634059947
variable        L_z_reduced equal 20.046884346862004

variable        sigma_ion_wall_reduced equal v_sigma_reduced/2
variable        q_wall_reduced equal -0.19030322569642577
variable        unit_cell_x_reduced equal v_sigma_ion_wall_reduced/2
variable        unit_cell_y_reduced equal v_sigma_ion_wall_reduced*sqrt(3)/2
variable        unit_cell_z_reduced equal v_sigma_ion_wall_reduced*sqrt(6)/3

variable        every equal 1000
variable        timesteps equal 1100000

variable        rng1 equal 12798
variable        rng2 equal 9990
variable        rng3 equal 22263
variable        rng4 equal 101763

# ==================================================================== #
# SYSTEM AND PARTICLES                                                 #
# ==================================================================== #

boundary        p p f
atom_style      charge
region          box block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${L_z_reduced} units box
create_box      5 box

mass            * ${mass_reduced}
region          electrolyte block 0 ${L_x_reduced} 0 ${L_y_reduced} ${sigma_ion_wall_reduced} $(v_L_z_reduced-v_sigma_ion_wall_reduced) units box
create_atoms    1 random ${N_ion} ${rng1} electrolyte
create_atoms    2 random ${N_ion} ${rng2} electrolyte
create_atoms    3 random $(v_N-2*v_N_ion) ${rng3} electrolyte
set             type 1 charge ${q_reduced}
set             type 2 charge -${q_reduced}
set             type 3 charge 0
group           electrolyte type 1 2 3

lattice         hcp $((v_unit_cell_z_reduced/0.916486)^-3)
region          left_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${unit_cell_z_reduced} units box
create_atoms    4 region left_wall
lattice         hcp $((v_unit_cell_z_reduced/0.916486)^-3) origin 0 0 0.552307414567
region          right_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} $(v_L_z_reduced-v_unit_cell_z_reduced) ${L_z_reduced} units box
create_atoms    5 region right_wall
set             type 4 charge 0
set             type 5 charge 0
group           electrodes type 4 5

# ==================================================================== #
# FORCE FIELD                                                          #
# ==================================================================== #

dielectric      ${varepsilon_r}
kspace_style    pppm 1e-4
kspace_modify   slab 3

pair_style      lj/cut/coul/long $(2^(1/6)*v_sigma_reduced)
pair_coeff      *3 *3 ${epsilon_reduced} ${sigma_reduced}
pair_coeff      4* 4* 0 0
pair_coeff      *3 4* $(100*v_epsilon_reduced) ${sigma_ion_wall_reduced} $(2^(1/6)*v_sigma_ion_wall_reduced)
pair_modify     shift yes

velocity        electrolyte create ${T_reduced} ${rng4}
velocity        electrodes set 0 0 0
fix             nvt electrolyte nvt temp ${T_reduced} ${T_reduced} $(100*dt)
fix             electrodes electrodes setforce 0 0 0

# ==================================================================== #
# ENERGY MINIMIZATION                                                  #
# ==================================================================== #

minimize        1e-6 1e-6 1000 1000
reset_timestep  0
set             type 4 charge ${q_wall_reduced}
set             type 5 charge $(-v_q_wall_reduced)

# ==================================================================== #
# SIMULATION                                                           #
# ==================================================================== #

write_data      cg_lj_coul_reduced_lammps.topology
dump            dump all netcdf ${every} cg_lj_coul_reduced_lammps.nc id type q x y z ix iy iz vx vy vz fx fy fz
thermo          ${every}
thermo_style    custom step spcpu cpuremain temp press ke evdwl ecoul elong
run             ${timesteps}

I do not believe this error is related to the NetCDF format, as the NetCDFFile/NetCDFReporter I wrote for OpenMM also uses the same format (NETCDF3_64BIT_DATA), but is recognized correctly by both netCDF4.Dataset and scipy.io.netcdf_file, and thus MDAnalysis.

Current version of MDAnalysis

  • Which version are you using? MDAnalysis 2.7.0
  • Which version of Python? Python 3.9.18
  • Which operating system? Ubuntu 20.04 LTS (5.10.102.1-microsoft-standard-WSL2)

bbye98 avatar Jan 30 '24 22:01 bbye98

Thanks for raising this issue @bbye98, unfortunately we've not had time to look into this yet.

Is there any way you could share a small version of a non-working netcdf file? I personally don't readily have access to (or the knowledge to use) LAMMPS and wouldn't be able to reproduce the file easily.

IAlibay avatar Feb 11 '24 09:02 IAlibay

@IAlibay Apologies for the late response! I've attached a sample topology and trajectory file below (from a slab LJ-Coulomb system):

example.zip

Some information on package versions:

❯ lmp
LAMMPS (21 Nov 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task

❯ python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0

❯ python -c "import scipy; print(scipy.__version__)"
1.12.0

Reading the LAMMPS trajectory file using MDAnalysis.Universe:

>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
    n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
    topology = p.parse(**kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
    n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
    with scipy.io.netcdf_file(filename, mmap=None) as f:
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
    topology = _topology_from_file_like(self.filename,
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
    raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.

Reading the LAMMPS trajectory file using scipy.io.netcdf_file:

>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
    n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
    topology = p.parse(**kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
    n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
    with scipy.io.netcdf_file(filename, mmap=None) as f:
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
    topology = _topology_from_file_like(self.filename,
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
    raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.
>>> from scipy.io import netcdf_file
>>> nc = netcdf_file("example.nc", "r")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
    self._read()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
    self._read_dim_array()
  File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
    raise ValueError("Unexpected header.")
ValueError: Unexpected header.

Reading the LAMMPS trajectory file using netCDF4.Dataset:

>>> import netCDF4 as nc
>>> nc_ = nc.Dataset("example.nc")
>>> nc_
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_DATA data model, file format NETCDF3):
    Conventions: AMBER
    ConventionVersion: 1.0
    program: LAMMPS
    programVersion: 21 Nov 2023
    dimensions(sizes): frame(11), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3)
    variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial)
    groups:

For what it's worth, I recently dropped support for scipy.io.netcdf_file altogether in favor of netCDF4.Dataset in my workflow because the former had bizarre behavior when used as an OpenMM trajectory writer, such as randomly writing the first frame of the simulation three times--once at the beginning and twice at the end of the trajectory file--despite correctly flushing/syncing and closing the NetCDF file at the end of the simulation. The SciPy implementation also doesn't report the length of an "unlimited" dimension (like time), unlike netCDF4.

bbye98 avatar Feb 12 '24 22:02 bbye98

This file is a bit strange: it starts with CDF5 instead of the usual and expected CDF2. Reading around, CDF2 seems to indicate 64-bit offsets (which are required by the format: https://ambermd.org/netcdf/nctraj.xhtml), and CDF5 seems to indicate 64-bit data and offsets (which would also moves the header around, explaining the error from scipy).

The code for this seems to have been around for a while in the netcdf-c library, not sure what caused this change in the output of LAMMPS. Maybe something about the specific version of netcdf LAMMPS is using?

Luthaf avatar Mar 14 '24 12:03 Luthaf

@bbye98 did you find out if anything changed in the LAMMPS netcdf output?

orbeckst avatar Mar 25 '24 19:03 orbeckst

Hi! Apologies for the late response. I'm not sure which version of NetCDF LAMMPS is using, but I can try installing an older version of LAMMPS (say, from 2021) to test.

bbye98 avatar Mar 25 '24 19:03 bbye98