scipy icon indicating copy to clipboard operation
scipy copied to clipboard

BUG: Scipy.io mmread does not respect variants of coordinate format

Open rohanmohapatra opened this issue 1 year ago • 3 comments
trafficstars

Describe your issue.

Hey!

Based on this file format provided by MathNist (https://math.nist.gov/MatrixMarket/formats.html), mmread does not respect the pattern variant of coordinate format.

The file format says:

as well as for those in which only the position of the nonzero entries is prescribed (pattern matrices).

That essentially means that I do not have to provide nonzero entries as

i j value

And that leads to the error in the reproducible code:

ValueError: Header line not of length 3: 2 1

Reproducing Code Example

import urllib
import zipfile
from scipy.io import mmread

def download_and_unzip_dataset(url: str, file_name: str)->str:
  extract_dir = "output"
  zip_path, _ = urllib.request.urlretrieve(url)
  with zipfile.ZipFile(zip_path, "r") as f:
    f.extractall(extract_dir)
  return f"{extract_dir}/{file_name}"

url = "https://nrvis.com/download/data/ca/ca-CSphd.zip"
save_path = download_and_unzip_dataset(url, 'ca-CSphd.mtx')

data = mmread(save_path)

Error message

/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in mmread(source)
    127            [0., 0., 0., 0., 0.]])
    128     """
--> 129     return MMFile().read(source)
    130 
    131 # -----------------------------------------------------------------------------

/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in read(self, source)
    579 
    580         try:
--> 581             self._parse_header(stream)
    582             return self._parse_body(stream)
    583 

/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in _parse_header(self, stream)
    643     def _parse_header(self, stream):
    644         rows, cols, entries, format, field, symmetry = \
--> 645             self.__class__.info(stream)
    646         self._init_attrs(rows=rows, cols=cols, entries=entries, format=format,
    647                          field=field, symmetry=symmetry)

/usr/local/lib/python3.10/dist-packages/scipy/io/_mmio.py in info(self, source)
    406             else:
    407                 if not len(split_line) == 3:
--> 408                     raise ValueError("Header line not of length 3: " +
    409                                      line.decode('ascii'))
    410                 rows, cols, entries = map(int, split_line)

ValueError: Header line not of length 3: 2 1

SciPy/NumPy/Python version and system information

1.11.4 1.25.2 sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.21.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /usr/local/include
    lib directory: /usr/local/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= HASWELL MAX_THREADS=2
    pc file directory: /usr/local/lib/pkgconfig
    version: 0.3.21.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.11.0
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.36
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.2.1
  pythran:
    include directory: ../../tmp/pip-build-env-c6c8ru56/overlay/lib/python3.10/site-packages/pythran
    version: 0.14.0
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
  cross-compiled: true
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: linux
Python Information:
  path: /opt/python/cp310-cp310/bin/python
  version: '3.10'

rohanmohapatra avatar Feb 18 '24 04:02 rohanmohapatra

Is this related to https://github.com/scipy/scipy/issues/9426?

dschmitz89 avatar Feb 18 '24 08:02 dschmitz89

Is this related to #9426?

I do not think so, the pattern variant does not need us to parse the coordinates as a 3 length line. It should also accept x1 y1 kind of coordinates too as the format allows it. mmread gives an error as it expects coordinates to be of length 3 which is not the case.

rohanmohapatra avatar Feb 19 '24 22:02 rohanmohapatra

I think this is an issue with the file rather than SciPy. The error you're getting is not related to the matrix values, it's the header.

The header of the file you're reading is:

%%MatrixMarket matrix coordinate pattern general
% type: directed graph
% 1882 1882 1740

But, the last header line that specifies the dimensions and number of non-zero entries shouldn't start with a '%' symbol. If I edit the file so the header does not have the % sign, it is read successfully:

%%MatrixMarket matrix coordinate pattern general
% type: directed graph
1882 1882 1740

TimButters avatar Mar 27 '24 22:03 TimButters