scipy
scipy copied to clipboard
BUG: Scipy.io mmread does not respect variants of coordinate format
Describe your issue.
Hey!
Based on this file format provided by MathNist (https://math.nist.gov/MatrixMarket/formats.html), mmread does not respect the pattern variant of coordinate format.
The file format says:
as well as for those in which only the position of the nonzero entries is prescribed (pattern matrices).
That essentially means that I do not have to provide nonzero entries as
i j value
And that leads to the error in the reproducible code:
ValueError: Header line not of length 3: 2 1
Reproducing Code Example
import urllib
import zipfile
from scipy.io import mmread
def download_and_unzip_dataset(url: str, file_name: str)->str:
extract_dir = "output"
zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
f.extractall(extract_dir)
return f"{extract_dir}/{file_name}"
url = "https://nrvis.com/download/data/ca/ca-CSphd.zip"
save_path = download_and_unzip_dataset(url, 'ca-CSphd.mtx')
data = mmread(save_path)
Error message
/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in mmread(source)
127 [0., 0., 0., 0., 0.]])
128 """
--> 129 return MMFile().read(source)
130
131 # -----------------------------------------------------------------------------
/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in read(self, source)
579
580 try:
--> 581 self._parse_header(stream)
582 return self._parse_body(stream)
583
/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in _parse_header(self, stream)
643 def _parse_header(self, stream):
644 rows, cols, entries, format, field, symmetry = \
--> 645 self.__class__.info(stream)
646 self._init_attrs(rows=rows, cols=cols, entries=entries, format=format,
647 field=field, symmetry=symmetry)
/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in info(self, source)
406 else:
407 if not len(split_line) == 3:
--> 408 raise ValueError("Header line not of length 3: " +
409 line.decode('ascii'))
410 rows, cols, entries = map(int, split_line)
ValueError: Header line not of length 3: 2 1
SciPy/NumPy/Python version and system information
1.11.4 1.25.2 sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)
Build Dependencies:
blas:
detection method: pkgconfig
found: true
include directory: /usr/local/include
lib directory: /usr/local/lib
name: openblas
openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
pc file directory: /usr/local/lib/pkgconfig
version: 0.3.21.dev
lapack:
detection method: pkgconfig
found: true
include directory: /usr/local/include
lib directory: /usr/local/lib
name: openblas
openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
pc file directory: /usr/local/lib/pkgconfig
version: 0.3.21.dev
pybind11:
detection method: config-tool
include directory: unknown
name: pybind11
version: 2.11.0
Compilers:
c:
commands: cc
linker: ld.bfd
name: gcc
version: 10.2.1
c++:
commands: c++
linker: ld.bfd
name: gcc
version: 10.2.1
cython:
commands: cython
linker: cython
name: cython
version: 0.29.36
fortran:
commands: gfortran
linker: ld.bfd
name: gcc
version: 10.2.1
pythran:
include directory: ../../tmp/pip-build-env-c6c8ru56/overlay/lib/python3.10/site-packages/pythran
version: 0.14.0
Machine Information:
build:
cpu: x86_64
endian: little
family: x86_64
system: linux
cross-compiled: true
host:
cpu: x86_64
endian: little
family: x86_64
system: linux
Python Information:
path: /opt/python/cp310-cp310/bin/python
version: '3.10'
Is this related to https://github.com/scipy/scipy/issues/9426?
Is this related to #9426?
I do not think so, the pattern variant does not need us to parse the coordinates as a 3 length line. It should also accept x1 y1 kind of coordinates too as the format allows it. mmread gives an error as it expects coordinates to be of length 3 which is not the case.
I think this is an issue with the file rather than SciPy. The error you're getting is not related to the matrix values, it's the header.
The header of the file you're reading is:
%%MatrixMarket matrix coordinate pattern general
% type: directed graph
% 1882 1882 1740
But, the last header line that specifies the dimensions and number of non-zero entries shouldn't start with a '%' symbol. If I edit the file so the header does not have the % sign, it is read successfully:
%%MatrixMarket matrix coordinate pattern general
% type: directed graph
1882 1882 1740