mdanalysis
mdanalysis copied to clipboard
Cannot load LAMMPS NetCDF trajectory files due to "header issue" in scipy.io.netcdf_file
Expected behavior
I have run a LAMMPS simulation of a simple electrolyte–solvent systems confined between two planar dielectric electrodes. During the simulation, the topology is written to LAMMPS dump file cg_lj_coul_reduced_lammps.topology and the trajectory is written to a NetCDF file cg_lj_coul_reduced_lammps.nc.
I am attempting to load both files to a MDAnalysis.Universe object using
Python 3 code
import os
import MDAnalysis as mda
os.chdir("/mnt/e/research/test/poisson_potential/data")
fname = "cg_lj_coul_reduced_lammps"
universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA",
atom_style="id type q x y z vx vy vz")
Actual behavior
Instead, I get the following ValueError stemming from a header check in scipy.io.netcdf_file:
Python 3 stack trace
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[35], line 7
5 os.chdir("/mnt/e/research/test/poisson_potential/data")
6 fname = "cg_lj_coul_reduced_lammps"
----> 7 universe = mda.Universe(f"{fname}.topology", f"{fname}.nc", topology_format="DATA",
8 atom_style="id type q x y z vx vy vz")
File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:375, in Universe.__init__(self, topology, all_coordinates, format, topology_format, transformations, guess_bonds, vdwradii, fudge_factor, lower_bound, in_memory, in_memory_step, *coordinates, **kwargs)
370 coordinates = _resolve_coordinates(self.filename, *coordinates,
371 format=format,
372 all_coordinates=all_coordinates)
374 if coordinates:
--> 375 self.load_new(coordinates, format=format, in_memory=in_memory,
376 in_memory_step=in_memory_step, **kwargs)
378 if transformations:
379 if callable(transformations):
File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py:580, in Universe.load_new(self, filename, format, in_memory, in_memory_step, **kwargs)
577 # supply number of atoms for readers that cannot do it for themselves
578 kwargs['n_atoms'] = self.atoms.n_atoms
--> 580 self.trajectory = reader(filename, format=format, **kwargs)
581 if self.trajectory.n_atoms != len(self.atoms):
582 raise ValueError("The topology and {form} trajectory files don't"
583 " have the same number of atoms!\n"
584 "Topology number of atoms {top_n_atoms}\n"
(...)
588 fname=filename,
589 trj_n_atoms=self.trajectory.n_atoms))
File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/lib/util.py:2553, in store_init_arguments.<locals>.wrapper(self, *args, **kwargs)
2551 else:
2552 self._kwargs[key] = arg
-> 2553 return func(self, *args, **kwargs)
File ~/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py:446, in NCDFReader.__init__(self, filename, n_atoms, mmap, **kwargs)
443 super(NCDFReader, self).__init__(filename, **kwargs)
445 # ensure maskandscale is off so we don't end up double scaling
--> 446 self.trjfile = NCDFPicklable(self.filename,
447 mmap=self._mmap,
448 maskandscale=False)
450 # AMBER NetCDF files should always have a convention
451 try:
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale)
276 self._attributes = {}
278 if mode in 'ra':
--> 279 self._read()
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self)
608 # Read file headers and set data.
609 self._read_numrecs()
--> 610 self._read_dim_array()
611 self._read_gatt_array()
612 self._read_var_array()
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self)
618 header = self.fp.read(4)
619 if header not in [ZERO, NC_DIMENSION]:
--> 620 raise ValueError("Unexpected header.")
621 count = self._unpack_int()
623 for dim in range(count):
ValueError: Unexpected header.
I get the same error when I try loading the trajectory file directly using scipy.io.netcdf_file, but not using netCDF4.Dataset:
Python 3 code
from netCDF4 import Dataset
from scipy.io import netcdf_file
print(Dataset(f"{fname}.nc", "r"))
print(netcdf_file(f"{fname}.nc", "r"))
Python 3 output + stack trace
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_DATA data model, file format NETCDF3):
Conventions: AMBER
ConventionVersion: 1.0
program: LAMMPS
programVersion: 3 Nov 2022
dimensions(sizes): frame(1101), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3)
variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial)
groups:
---------------------------------------------------------------------------
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[39], line 5
2 from scipy.io import netcdf_file
4 print(Dataset(f"{fname}.nc", "r"))
----> 5 print(netcdf_file(f"{fname}.nc", "r"))
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:279, in netcdf_file.__init__(self, filename, mode, mmap, version, maskandscale)
276 self._attributes = {}
278 if mode in 'ra':
--> 279 self._read()
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:610, in netcdf_file._read(self)
608 # Read file headers and set data.
609 self._read_numrecs()
--> 610 self._read_dim_array()
611 self._read_gatt_array()
612 self._read_var_array()
File ~/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py:620, in netcdf_file._read_dim_array(self)
618 header = self.fp.read(4)
619 if header not in [ZERO, NC_DIMENSION]:
--> 620 raise ValueError("Unexpected header.")
621 count = self._unpack_int()
623 for dim in range(count):
ValueError: Unexpected header.
Code to reproduce the behavior
This is the LAMMPS script used to generate the topology and trajectory files:
LAMMPS script
# ==================================================================== #
# PARAMETERS AND SETTINGS #
# ==================================================================== #
shell cd /mnt/e/research/test/poisson_potential/data
log cg_lj_coul_reduced_lammps.log
variable N equal 1000
variable T_reduced equal 1
variable mass_reduced equal 1
variable epsilon_reduced equal 1
variable sigma_reduced equal 1
variable x equal 0.5
variable N_ion equal v_x*v_N/2
variable q_reduced equal 13.625994230719545
variable varepsilon_r equal 78
variable L_x_reduced equal 8
variable L_y_reduced equal 7.794228634059947
variable L_z_reduced equal 20.046884346862004
variable sigma_ion_wall_reduced equal v_sigma_reduced/2
variable q_wall_reduced equal -0.19030322569642577
variable unit_cell_x_reduced equal v_sigma_ion_wall_reduced/2
variable unit_cell_y_reduced equal v_sigma_ion_wall_reduced*sqrt(3)/2
variable unit_cell_z_reduced equal v_sigma_ion_wall_reduced*sqrt(6)/3
variable every equal 1000
variable timesteps equal 1100000
variable rng1 equal 12798
variable rng2 equal 9990
variable rng3 equal 22263
variable rng4 equal 101763
# ==================================================================== #
# SYSTEM AND PARTICLES #
# ==================================================================== #
boundary p p f
atom_style charge
region box block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${L_z_reduced} units box
create_box 5 box
mass * ${mass_reduced}
region electrolyte block 0 ${L_x_reduced} 0 ${L_y_reduced} ${sigma_ion_wall_reduced} $(v_L_z_reduced-v_sigma_ion_wall_reduced) units box
create_atoms 1 random ${N_ion} ${rng1} electrolyte
create_atoms 2 random ${N_ion} ${rng2} electrolyte
create_atoms 3 random $(v_N-2*v_N_ion) ${rng3} electrolyte
set type 1 charge ${q_reduced}
set type 2 charge -${q_reduced}
set type 3 charge 0
group electrolyte type 1 2 3
lattice hcp $((v_unit_cell_z_reduced/0.916486)^-3)
region left_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} 0 ${unit_cell_z_reduced} units box
create_atoms 4 region left_wall
lattice hcp $((v_unit_cell_z_reduced/0.916486)^-3) origin 0 0 0.552307414567
region right_wall block 0 ${L_x_reduced} 0 ${L_y_reduced} $(v_L_z_reduced-v_unit_cell_z_reduced) ${L_z_reduced} units box
create_atoms 5 region right_wall
set type 4 charge 0
set type 5 charge 0
group electrodes type 4 5
# ==================================================================== #
# FORCE FIELD #
# ==================================================================== #
dielectric ${varepsilon_r}
kspace_style pppm 1e-4
kspace_modify slab 3
pair_style lj/cut/coul/long $(2^(1/6)*v_sigma_reduced)
pair_coeff *3 *3 ${epsilon_reduced} ${sigma_reduced}
pair_coeff 4* 4* 0 0
pair_coeff *3 4* $(100*v_epsilon_reduced) ${sigma_ion_wall_reduced} $(2^(1/6)*v_sigma_ion_wall_reduced)
pair_modify shift yes
velocity electrolyte create ${T_reduced} ${rng4}
velocity electrodes set 0 0 0
fix nvt electrolyte nvt temp ${T_reduced} ${T_reduced} $(100*dt)
fix electrodes electrodes setforce 0 0 0
# ==================================================================== #
# ENERGY MINIMIZATION #
# ==================================================================== #
minimize 1e-6 1e-6 1000 1000
reset_timestep 0
set type 4 charge ${q_wall_reduced}
set type 5 charge $(-v_q_wall_reduced)
# ==================================================================== #
# SIMULATION #
# ==================================================================== #
write_data cg_lj_coul_reduced_lammps.topology
dump dump all netcdf ${every} cg_lj_coul_reduced_lammps.nc id type q x y z ix iy iz vx vy vz fx fy fz
thermo ${every}
thermo_style custom step spcpu cpuremain temp press ke evdwl ecoul elong
run ${timesteps}
I do not believe this error is related to the NetCDF format, as the NetCDFFile/NetCDFReporter I wrote for OpenMM also uses the same format (NETCDF3_64BIT_DATA), but is recognized correctly by both netCDF4.Dataset and scipy.io.netcdf_file, and thus MDAnalysis.
Current version of MDAnalysis
- Which version are you using? MDAnalysis 2.7.0
- Which version of Python? Python 3.9.18
- Which operating system? Ubuntu 20.04 LTS (5.10.102.1-microsoft-standard-WSL2)
Thanks for raising this issue @bbye98, unfortunately we've not had time to look into this yet.
Is there any way you could share a small version of a non-working netcdf file? I personally don't readily have access to (or the knowledge to use) LAMMPS and wouldn't be able to reproduce the file easily.
@IAlibay Apologies for the late response! I've attached a sample topology and trajectory file below (from a slab LJ-Coulomb system):
Some information on package versions:
❯ lmp
LAMMPS (21 Nov 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
❯ python -c "import MDAnalysis as mda; print(mda.__version__)"
2.7.0
❯ python -c "import scipy; print(scipy.__version__)"
1.12.0
Reading the LAMMPS trajectory file using MDAnalysis.Universe:
>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
topology = p.parse(**kwargs)
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
with scipy.io.netcdf_file(filename, mmap=None) as f:
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
self._read()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
self._read_dim_array()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
raise ValueError("Unexpected header.")
ValueError: Unexpected header.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
topology = _topology_from_file_like(self.filename,
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.
Reading the LAMMPS trajectory file using scipy.io.netcdf_file:
>>> u = mda.Universe("example.nc")
Traceback (most recent call last):
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 64, in parse
n_atoms = kwargs['n_atoms']
KeyError: 'n_atoms'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 110, in _topology_from_file_like
topology = p.parse(**kwargs)
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/topology/MinimalParser.py", line 67, in parse
n_atoms = reader.parse_n_atoms(self.filename, **kwargs)
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/coordinates/TRJ.py", line 621, in parse_n_atoms
with scipy.io.netcdf_file(filename, mmap=None) as f:
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
self._read()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
self._read_dim_array()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
raise ValueError("Unexpected header.")
ValueError: Unexpected header.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 356, in __init__
topology = _topology_from_file_like(self.filename,
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/MDAnalysis/core/universe.py", line 125, in _topology_from_file_like
raise ValueError(
ValueError: Failed to construct topology from file example.nc with parser <class 'MDAnalysis.topology.MinimalParser.MinimalParser'>.
Error: Unexpected header.
>>> from scipy.io import netcdf_file
>>> nc = netcdf_file("example.nc", "r")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 279, in __init__
self._read()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 610, in _read
self._read_dim_array()
File "/home/bye/miniconda3/envs/research/lib/python3.9/site-packages/scipy/io/_netcdf.py", line 620, in _read_dim_array
raise ValueError("Unexpected header.")
ValueError: Unexpected header.
Reading the LAMMPS trajectory file using netCDF4.Dataset:
>>> import netCDF4 as nc
>>> nc_ = nc.Dataset("example.nc")
>>> nc_
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT_DATA data model, file format NETCDF3):
Conventions: AMBER
ConventionVersion: 1.0
program: LAMMPS
programVersion: 21 Nov 2023
dimensions(sizes): frame(11), atom(1576), cell_spatial(3), cell_angular(3), label(10), spatial(3)
variables(dimensions): |S1 spatial(spatial), |S1 cell_spatial(spatial), |S1 cell_angular(spatial, label), float32 time(frame), float32 cell_origin(frame, cell_spatial), float32 cell_lengths(frame, cell_spatial), float32 cell_angles(frame, cell_angular), int32 id(frame, atom), int32 type(frame, atom), float32 q(frame, atom), float32 coordinates(frame, atom, spatial), int32 ix(frame, atom), int32 iy(frame, atom), int32 iz(frame, atom), float32 velocities(frame, atom, spatial), float32 forces(frame, atom, spatial)
groups:
For what it's worth, I recently dropped support for scipy.io.netcdf_file altogether in favor of netCDF4.Dataset in my workflow because the former had bizarre behavior when used as an OpenMM trajectory writer, such as randomly writing the first frame of the simulation three times--once at the beginning and twice at the end of the trajectory file--despite correctly flushing/syncing and closing the NetCDF file at the end of the simulation. The SciPy implementation also doesn't report the length of an "unlimited" dimension (like time), unlike netCDF4.
This file is a bit strange: it starts with CDF5 instead of the usual and expected CDF2. Reading around, CDF2 seems to indicate 64-bit offsets (which are required by the format: https://ambermd.org/netcdf/nctraj.xhtml), and CDF5 seems to indicate 64-bit data and offsets (which would also moves the header around, explaining the error from scipy).
The code for this seems to have been around for a while in the netcdf-c library, not sure what caused this change in the output of LAMMPS. Maybe something about the specific version of netcdf LAMMPS is using?
@bbye98 did you find out if anything changed in the LAMMPS netcdf output?
Hi! Apologies for the late response. I'm not sure which version of NetCDF LAMMPS is using, but I can try installing an older version of LAMMPS (say, from 2021) to test.