fastparquet icon indicating copy to clipboard operation
fastparquet copied to clipboard

Install numpy-1.20.0rc1 causing errors

Open lukasstankiewicz opened this issue 4 years ago • 16 comments

What happened:

Package versions before 07.12.2020 fastparquet-0.4.1 llvmlite-0.34.0 numba-0.51.2 numpy-1.19.2 packaging-20.4 pandas-1.1.3 pyparsing-2.4.7 python-dateutil-2.8.1 python-snappy-0.5.4 pytz-2020.1 thrift-0.13.0

Since 07.12.2020 I started getting an fastparquet error on Python 3.6

Collecting fastparquet==0.4.1 Downloading fastparquet-0.4.1.tar.gz (28.6 MB) ERROR: Command errored out with exit status 1: command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-7obzjc0l/fastparquet/setup.py'"'"'; __file__='"'"'/tmp/pip-install-7obzjc0l/fastparquet/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-b0abneeh cwd: /tmp/pip-install-7obzjc0l/fastparquet/ Complete output (68 lines): Traceback (most recent call last): File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 250, in run_setup _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 45, in _execfile exec(code, globals, locals) File "/tmp/easy_install-ndh2xtme/numpy-1.20.0rc1/setup.py", line 30, in <module> extra = {} RuntimeError: Python version >= 3.7 required. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-7obzjc0l/fastparquet/setup.py", line 98, in <module> **extra File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 128, in setup _install_setup_requires(attrs) File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 123, in _install_setup_requires dist.fetch_build_eggs(dist.setup_requires) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 513, in fetch_build_eggs replace_conflicting=True, File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 774, in resolve replace_conflicting=replace_conflicting File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1057, in best_match return self.obtain(req, installer) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1069, in obtain return installer(requirement) File "/usr/lib/python3/dist-packages/setuptools/dist.py", line 580, in fetch_build_egg return cmd.easy_install(req) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 698, in easy_install return self.install_item(spec, dist.location, tmpdir, deps) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 724, in install_item dists = self.install_eggs(spec, download, tmpdir) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 909, in install_eggs return self.build_and_install(setup_script, setup_base) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1177, in build_and_install self.run_setup(setup_script, setup_base, args) File "/usr/lib/python3/dist-packages/setuptools/command/easy_install.py", line 1163, in run_setup run_setup(setup_script, args) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 253, in run_setup raise File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3.6/contextlib.py", line 99, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 166, in save_modules saved_exc.resume() File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 141, in resume six.reraise(type, exc, self._tb) File "/usr/lib/python3/dist-packages/setuptools/_vendor/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 154, in save_modules yield saved File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 195, in setup_context yield File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 250, in run_setup _execfile(setup_script, ns) File "/usr/lib/python3/dist-packages/setuptools/sandbox.py", line 45, in _execfile exec(code, globals, locals) File "/tmp/easy_install-ndh2xtme/numpy-1.20.0rc1/setup.py", line 30, in <module> extra = {} RuntimeError: Python version >= 3.7 required. ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output. WARNING: You are using pip version 20.2.4; however, version 20.3.1 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. ERROR: Service 'aviation-pipelines-service' failed to build : The command '/bin/sh -c pip3 install -r requirements.python.txt' returned a non-zero code: 1

It was working fine for 2 months as everything was installed in Docker. So I upgraded python to 3.7.5 as required now (why?) Strange that fastparquet try to install RC version of numpy File "/tmp/easy_install-ndh2xtme/numpy-1.20.0rc1/setup.py", line 30, in <module>

Now running script:

import sys, getopt
import pandas as pd
import warnings

def main(argv):
   inputfile = ''
   outputfile = ''
   try:
      opts, args = getopt.getopt(argv,"hi:o:",["file=", "ifile=","ofile="])
   except getopt.GetoptError:
      print('test.py -i <inputfile> -o <outputfile>')
      sys.exit(2)
   for opt, arg in opts:
      if opt == '-h':
         print('test.py -i <inputfile> -o <outputfile>')
         sys.exit()
      elif opt in ("-i", "--ifile"):
         inputfile = arg
      elif opt in ("-o", "--ofile"):
         outputfile = arg

   df = pd.read_parquet(inputfile, engine='fastparquet')
   df.to_csv(outputfile)

   print('Done')

if __name__ == "__main__":
   main(sys.argv[1:])

I have errors

Traceback (most recent call last): File "/home/node/app/src/core/parquet/convert-to-csv.py", line 29, in <module> main(sys.argv[1:]) File "/home/node/app/src/core/parquet/convert-to-csv.py", line 23, in main df = pd.read_parquet(inputfile, engine='fastparquet') File "/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py", line 316, in read_parquet impl = get_engine(engine) File "/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py", line 44, in get_engine return FastParquetImpl() File "/usr/local/lib/python3.7/dist-packages/pandas/io/parquet.py", line 155, in __init__ "fastparquet", extra="fastparquet is required for parquet support." File "/usr/local/lib/python3.7/dist-packages/pandas/compat/_optional.py", line 107, in import_optional_dependency module = importlib.import_module(name) File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 728, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/usr/local/lib/python3.7/dist-packages/fastparquet/__init__.py", line 5, in <module> from .core import read_thrift File "/usr/local/lib/python3.7/dist-packages/fastparquet/core.py", line 9, in <module> from . import encoding File "/usr/local/lib/python3.7/dist-packages/fastparquet/encoding.py", line 19, in <module> from .speedups import unpack_byte_array File "__init__.pxd", line 242, in init fastparquet.speedups ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Environment:

  • Python version: from 3.6 to 3.8
  • Docker Image on Ubuntu 18.04:
  • Pip: 20.2.3

lukasstankiewicz avatar Dec 09 '20 12:12 lukasstankiewicz

Interesting is that

FROM ubuntu:18.04 AS development

USER root:root

RUN \
  apt-get update && apt-get install -y curl make gcc g++ cmake python3.6 python3.6-dev python3-pip gnupg libsnappy-dev

COPY --chown=root:root   requirements.python.txt ./

RUN pip3 install --upgrade pip==20.2.3
RUN pip3 --version
RUN pip3 install -r requirements.python.txt

return first error from description above

and when i install each lib separately it works fine

FROM ubuntu:18.04 AS development

USER root:root

RUN \
  apt-get update && apt-get install -y curl make gcc g++ cmake python3.6 python3.6-dev python3-pip gnupg libsnappy-dev

COPY --chown=root:root   requirements.python.txt ./

RUN pip3 install --upgrade pip==20.2.3
RUN pip3 --version

RUN pip3 install numpy==1.18.0
RUN pip3 install pandas==1.1.3
RUN pip3 install fastparquet==0.4.1
RUN pip3 install python-snappy==0.5.4

no error at all everything working fine

lukasstankiewicz avatar Dec 09 '20 13:12 lukasstankiewicz

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

this is the critical thing, I think, and has shown up before. When this happens during import, it's just a warning. Probably there's something about it on the numpy tracker

What actually happens during pip install depends on your situation. The binary wheel is built against a specific version of numpy; but if you build from source you will build against the currently installed numpy, after either recreating the C code with cython or not.

I don't know why pip would be picking the RC numpy...

Note: I generally install using conda to avoid such problems.

martindurant avatar Dec 09 '20 15:12 martindurant

When this happens during import, it's just a warning

Hm, maybe not - but there are similar warnings around about the size of dtype.

martindurant avatar Dec 09 '20 17:12 martindurant

It's not a warning but an exception. We alse see it in different projects.

All these projects are using Dockerfiles containing a pip install -r requirements.txt.

freekwiekmeijer avatar Dec 11 '20 14:12 freekwiekmeijer

RUN
apt-get update && apt-get install -y curl make gcc g++ cmake python3.6 python3.6-dev python3-pip gnupg libsnappy-dev

COPY --chown=root:root requirements.python.txt ./

RUN pip3 install --upgrade pip==20.2.3 RUN pip3 --version

RUN pip3 install numpy==1.18.0 RUN pip3 install pandas==1.1.3 RUN pip3 install fastparquet==0.4.1 RUN pip3 install python-snappy==0.5.4


no error at all everything working fine

I am able to reproduce the problem using a Dockerfile. The dockerfile contains a RUN pip install -t <custom location> -r requirements.txt.

To reproduce I add a python file test.py. Dockerfile: RUN python test.py.

test.py:

#!/usr/bin/env python
import fastparquet

Then to veryfy the workaround.

I inserted these new lines in Dockerfile:

RUN pip install numpy==1.18.0
RUN pip install fastparquet==0.4.1
RUN pip install python-snappy==0.5.4

Note that these install into the system global site-packages location, i.e. /usr/local/lib/python3.7/lib/site-packages or something similar; not to the custom app location. Now it works. The sequence (numpy before fastparquet) and the location (system global instead of app directory) matter.

freekwiekmeijer avatar Dec 14 '20 19:12 freekwiekmeijer

If you have a suggestion for the right incantation for requests.txt, please comment in #538 .

martindurant avatar Dec 14 '20 19:12 martindurant

Based on the intent of https://github.com/dask/fastparquet/blob/master/setup.py#L25-L29, this block looks problematic https://github.com/dask/fastparquet/blob/master/setup.py#L74-L76 (includes numpy in no matter what command is being run). The stackoverflow link in the block has a less-upvoted answer that links to how scipy includes numpy - https://github.com/scipy/scipy/blob/master/setup.py#L566 . Seems like they have a more robust solution (probably don't need all of it) that can be used here?

davidhao3300 avatar Dec 14 '20 20:12 davidhao3300

Seems like they have a more robust solution (probably don't need all of it) that can be used here?

Willing to try it! Do you want to put in a PR? I think so long as the CI build does a python setup.py or pip install, that should be test enough.

martindurant avatar Dec 14 '20 20:12 martindurant

Given that we encountered this issue on the job and have a workaround (install numpy first), it's unlikely I'll get to this within the workweek; mostly wanted to offer a solution for anyone interested in fixing this properly. That said, I may have time over the weekend to work on this, but no promises!

davidhao3300 avatar Dec 14 '20 20:12 davidhao3300

A couple of extra questions:

  • does this problem happen on py37-39?
  • how about if you don't upgrade the version of pip?

martindurant avatar Dec 15 '20 15:12 martindurant

The following works... Note that fastparquet 0.4.1 is not supposed to work on py36 any more, so you should go back in versions to find one that does. I don't know if that would also fix the numpy version problem

FROM ubuntu:18.04 AS development

USER root:root

RUN \
  apt-get update && apt-get install -y curl make gcc g++ cmake python3.7 python3.7-dev python3-pip gnupg libsnappy-dev

RUN python3.7 -m pip install --upgrade pip==20.2.3
RUN python3.7 -m pip --version
RUN python3.7 -m pip install fastparquet

martindurant avatar Dec 15 '20 16:12 martindurant

If that's enough of a solution, please close this; but in any case, I won't hold v0.5.0 over this.

martindurant avatar Dec 15 '20 16:12 martindurant

Another solution seems to disable installing fastparquet as a binary package by using the --no-binary flag documented here. In requirements.txt, the line can be: fastparquet --no-binary=fastparquet as per this StackOverflow post.

paulistoan avatar Jan 04 '21 21:01 paulistoan

Not to piggyback too much on an old issue, but there is a new warning that comes with this numpy.

../../../../../usr/share/miniconda3/envs/test-environment/lib/python3.8/site-packages/fastparquet/writer.py:70
  /usr/share/miniconda3/envs/test-environment/lib/python3.8/site-packages/fastparquet/writer.py:70: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. Use `bool` by itself, which is identical in behavior, to silence this warning. If you specifically wanted the numpy scalar type, use `np.bool_` here.
    pd.BooleanDtype(): np.bool

Fix: https://github.com/dask/fastparquet/pull/551

jsignell avatar Jan 21 '21 19:01 jsignell

FWIW fastparquet.0.6.0post1 doesn't work at all due to this bug, whereas fastparquet.0.5.0 used to work. numpy 1.19.4.

edwintorok avatar May 09 '21 13:05 edwintorok

Would it be best to not publish a binary wheel and let the user build it for their local version of numpy?

edwintorok avatar May 09 '21 13:05 edwintorok