datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Issue loading datasets -- pyarrow.lib has no attribute

Open margotwagner opened this issue 3 years ago • 1 comments

Describe the bug

I am trying to load sentiment analysis datasets from huggingface, but any dataset I try to use via load_dataset, I get the same error: AttributeError: module 'pyarrow.lib' has no attribute 'IpcReadOptions'

Steps to reproduce the bug

dataset = load_dataset("glue", "cola")

Expected results

Download datasets without issue.

Actual results

AttributeError: module 'pyarrow.lib' has no attribute 'IpcReadOptions'

Environment info

  • datasets version: 2.3.2
  • Platform: macOS-10.15.7-x86_64-i386-64bit
  • Python version: 3.8.5
  • PyArrow version: 8.0.0
  • Pandas version: 1.1.0

margotwagner avatar Jul 11 '22 22:07 margotwagner

Hi @margotwagner, thanks for reporting.

Unfortunately, I'm not able to reproduce your bug: in an environment with datasets-2.3.2 and pyarrow-8.0.0, I can load the datasets without any problem:

>>> ds = load_dataset("glue", "cola")
>>> ds
DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 8551
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1043
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1063
    })
})

>>> import pyarrow
>>> pyarrow.__version__
8.0.0
>>> from pyarrow.lib import IpcReadOptions
>>> IpcReadOptions
pyarrow.lib.IpcReadOptions

I think you may have a problem in your Python environment: maybe you have also an old version of pyarrow that has precedence when importing it.

Could you please check this (just after you tried to load the dataset and got the error)?

>>> import pyarrow
>>> pyarrow.__version__

albertvillanova avatar Jul 12 '22 04:07 albertvillanova