datasets icon indicating copy to clipboard operation
datasets copied to clipboard

RecursionError when downloading datasets with python 3.12: set requirements accordingly?

Open cleong110 opened this issue 1 year ago • 7 comments

image

Steps to reproduce on my own machine:

conda create -n sign_language_datasets pip 
conda activate sign_language_datasets 
python --version # 3.12 by default
python -m pip install sign-language-datasets webvtt-py

# create a download_dgs_corpus.py file with the following contents
import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig

import itertools
import sys
print(sys.getrecursionlimit())
# sys.setrecursionlimit(50)
# default settings includes both pose and video
dgs_corpus = tfds.load('dgs_corpus')

# run it
python download_dgs_corpus.py 

It works in colab (Python 3.10), but not on my machine in an env with python 3.12. When I create a conda env with 3.10 it works without issue.

~~When I create an env with 3.11, I get "no module named lxml" but that's a different issue~~ edit: I was installing in my base environment, never mind this part

https://github.com/tensorflow/datasets/issues/4666 upstream issue, apparently.

cleong110 avatar Mar 26 '24 19:03 cleong110

~~OK, installed lxml and now I'm getting "Failed to get url https://nlp.biu.ac.il/~amit/datasets/dgs.json. HTTP code: 404.", which seems new but unrelated to this~~ never mind, python 3.11 seems to work fine, I was installing in my conda base env

cleong110 avatar Mar 26 '24 19:03 cleong110

So it really does seem that Python 3.12 is the issue, as noted in https://github.com/tensorflow/datasets/issues/4666.

cleong110 avatar Mar 26 '24 20:03 cleong110

Never mind the nevermind, if you have python 3.11 you need to manually install lxml or dgs corpus downloading crashes when using default config. But that's a DGS-corpus-specific issue I suppose, so never mind the neverminding of the nevermind maybe? image

cleong110 avatar Mar 26 '24 20:03 cleong110

Thanks for this.

abir-g avatar May 25 '24 11:05 abir-g

According to https://github.com/tensorflow/datasets/issues/4666#issuecomment-2149200103, this is now fixed in the latest version of tfds.

If we can confirm that, we can close this issue.

cleong110 avatar Jun 05 '24 17:06 cleong110

Gave it a go. New conda env, python 3.12, pip install sign_language_datasets. Ended up with tfds-nightly-4.9.5.dev202406050044, not 4.9.6, the version of tfds which supposedly solves this.

cleong110 avatar Jun 05 '24 17:06 cleong110

Did some shenanigans - uninstalled tfds-nightly, and then pip install tensorflow-datasets, and then it couldn't import it, so pip install -U --force-reinstall tensorflow-datasets and then now it seems to work.

cleong110 avatar Jun 05 '24 17:06 cleong110