Reranker icon indicating copy to clipboard operation
Reranker copied to clipboard

Datasets.load_dataset breaks with Python 3.9

Open Valerieps opened this issue 3 years ago • 2 comments

Error: if python 3.9 is installed, the setup command will install Pandas 1.3.0 because older versions of Pandas are not compatible with Python 3.9. This Pandas version doesn't accept the following call:

read_csv("file.csv", names=None, prefix=None) 

breaking the load_dataset function when used with the csv script.

The function call bellow in build_train_from_ranking.py will output the following error message: "ValueError: Specified named and prefix; you can only specify one."

train_doc_collection = datasets.load_dataset(
        path='csv',
        data_files=collection_path,
        column_names=columns,
        delimiter='\t',
        ignore_verifications=True,
    )['train']

That is because the last Pandas update doesn't accept None as parameter, only pandas.lib.no_default constant as per issue #42387.

Downgrading to Python 3.8 and Pandas 1.0.4 corrects the problem.

I believe python 3.8 should be enforced.

Valerieps avatar Jul 11 '21 22:07 Valerieps

I will take look. In particular, I want to know if this is a regression due to outdated datasets package. Can you print datasets.__version__?

luyug avatar Jul 13 '21 21:07 luyug

Hi ! This is an issue with pandas 1.3.0, please update datasets or use an older version of pandas until this is fixed

lhoestq avatar Jul 26 '21 14:07 lhoestq