datasets
datasets copied to clipboard
Cannot import datasets - ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility
Describe the bug
When trying to import datasets, I get a pyarrow ValueError:
Traceback (most recent call last):
File "/Users/edward/test/test.py", line 1, in
Steps to reproduce the bug
import datasets
Expected behavior
Successful import
Environment info
Conda environment, MacOS python 3.9.12 datasets 2.12.0
Based on https://github.com/rapidsai/cudf/issues/10187, this probably means your pyarrow
installation is not compatible with datasets
.
Can you please execute the following commands in the terminal and paste the output here?
conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"
Based on rapidsai/cudf#10187, this probably means your
pyarrow
installation is not compatible withdatasets
.Can you please execute the following commands in the terminal and paste the output here?
conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"
Here is the output to the first command:
arrow-cpp 11.0.0 py39h7f74497_0
pyarrow 12.0.0 pypi_0 pypi
and the second:
/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/__init__.py
Thanks!
after installing pytesseract 0.3.10, I got the above error. FYI
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback): pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0
to force install the previous version solved the problem.
Do we need to update dependencies?
Please note that our CI properly passes all tests with pyarrow-12.0.0
, for Python 3.7 and Python 3.10, for Ubuntu and Windows: see for example https://github.com/huggingface/datasets/actions/runs/5157324334/jobs/9289582291
For conda with python3.8.16 this solved my problem! thanks!
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running
pip install pyarrow==11.0.0
to force install the previous version solved the problem.Do we need to update dependencies? I can work on that if no one else is working on it.
Thanks for replying. I am not sure about those environments but it seems like pyarrow-12.0.0 does not work for conda with python 3.8.16.
Please note that our CI properly passes all tests with
pyarrow-12.0.0
, for Python 3.7 and Python 3.10, for Ubuntu and Windows: see for example https://github.com/huggingface/datasets/actions/runs/5157324334/jobs/9289582291
Got the same error with:
arrow-cpp 11.0.0 py310h7516544_0
pyarrow 12.0.0 pypi_0 pypi
python 3.10.11 h7a1cb2a_2
datasets 2.13.0 pyhd8ed1ab_0 conda-forge
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running
pip install pyarrow==11.0.0
to force install the previous version solved the problem.Do we need to update dependencies?
This solved the issue for me as well.
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running
pip install pyarrow==11.0.0
to force install the previous version solved the problem.Do we need to update dependencies?
Solved it for me also
基于 rapidsai/cudf#10187,这可能意味着您的安装与 不兼容。
pyarrow``datasets
您能否在终端中执行以下命令并将输出粘贴到此处?
conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"
arrow-cpp 11.0.0 py310h7516544_0
pyarrow 12.0.1 pypi_0 pypi
/root/miniconda3/lib/python3.10/site-packages/pyarrow/init.py
Got the same problem with
arrow-cpp 11.0.0 py310h1fc3239_0
pyarrow 12.0.1 pypi_0 pypi
miniforge3/envs/mlp/lib/python3.10/site-packages/pyarrow/init.py
Reverting back to pyarrow 11 solved the problem.
Solved with pip install pyarrow==11.0.0
I got different. Solved with pip install pyarrow==12.0.1 pip install cchardet
env: Python 3.9.16 transformers 4.32.1
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running
pip install pyarrow==11.0.0
to force install the previous version solved the problem.Do we need to update dependencies?
This works for me as well
I got different. Solved with pip install pyarrow==12.0.1 pip install cchardet
env: Python 3.9.16 transformers 4.32.1
I guess it also depends on the Python version. I got Python 3.11.5 and pyarrow==12.0.0. It works!
Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I'm using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11
Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I'm using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11
thanks! I met the same problem and your suggestion solved it.
(I was doing quiet install so I didn't notice it initially) I've been loading the same dataset for months on Colab, just now I got this error as well. I think Colab has changed their image recently (I had some errors regarding CUDA previously as well). beware of this and restart runtime if you're doing quite pip installs. moreover installing stable version of datasets on pypi gives this:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ibis-framework 7.1.0 requires pyarrow<15,>=2, but you have pyarrow 15.0.0 which is incompatible.
Successfully installed datasets-2.17.0 dill-0.3.8 multiprocess-0.70.16 pyarrow-15.0.0
WARNING: The following packages were previously imported in this runtime:
[pyarrow]
You must restart the runtime in order to use newly installed versions.
for colab - pip install pyarrow==11.0.0
The above methods didn't help me. So I installed an older version: !pip install datasets==2.16.1
and import datasets
worked!!
@rasith1998 @PennlaineChu You can avoid this issue by restarting the session after the datasets
installation (see https://github.com/huggingface/datasets/issues/6661 for more info)
Also, we've contacted Google Colab folks to update the default PyArrow installation, so the issue should soon be "officially" resolved on their side.
Also, we've contacted Google Colab folks to update the default PyArrow installation, so the issue should soon be "officially" resolved on their side.
This has been done! Google Colab now pre-installs PyArrow 14.0.2, which makes this issue unlikely to happen, so I'm closing it.
I am facing this issue outside of Colab, in a normal Python (3.10.14) environment:
pyarrow==11.0.0
datasets=2.20.0
transformers==4.41.2
What can I do to solve it?
I am somewhat bound to pyarrow==11.0.0
. Is there a version of datasets
that supports this?