datasets
                                
                                 datasets copied to clipboard
                                
                                    datasets copied to clipboard
                            
                            
                            
                        RecursionError when downloading datasets with python 3.12: set requirements accordingly?
Steps to reproduce on my own machine:
conda create -n sign_language_datasets pip 
conda activate sign_language_datasets 
python --version # 3.12 by default
python -m pip install sign-language-datasets webvtt-py
# create a download_dgs_corpus.py file with the following contents
import tensorflow_datasets as tfds
import sign_language_datasets.datasets
from sign_language_datasets.datasets.config import SignDatasetConfig
import itertools
import sys
print(sys.getrecursionlimit())
# sys.setrecursionlimit(50)
# default settings includes both pose and video
dgs_corpus = tfds.load('dgs_corpus')
# run it
python download_dgs_corpus.py 
It works in colab (Python 3.10), but not on my machine in an env with python 3.12. When I create a conda env with 3.10 it works without issue.
~~When I create an env with 3.11, I get "no module named lxml" but that's a different issue~~ edit: I was installing in my base environment, never mind this part
https://github.com/tensorflow/datasets/issues/4666 upstream issue, apparently.
~~OK, installed lxml and now I'm getting "Failed to get url https://nlp.biu.ac.il/~amit/datasets/dgs.json. HTTP code: 404.", which seems new but unrelated to this~~ never mind, python 3.11 seems to work fine, I was installing in my conda base env
So it really does seem that Python 3.12 is the issue, as noted in https://github.com/tensorflow/datasets/issues/4666.
Never mind the nevermind, if you have python 3.11 you need to manually install lxml or dgs corpus downloading crashes when using default config. But that's a DGS-corpus-specific issue I suppose, so never mind the neverminding of the nevermind maybe?
Thanks for this.
According to https://github.com/tensorflow/datasets/issues/4666#issuecomment-2149200103, this is now fixed in the latest version of tfds.
If we can confirm that, we can close this issue.
Gave it a go. New conda env, python 3.12, pip install sign_language_datasets. Ended up with  tfds-nightly-4.9.5.dev202406050044, not 4.9.6, the version of tfds which supposedly solves this.
Did some shenanigans - uninstalled tfds-nightly, and then pip install tensorflow-datasets, and then it couldn't import it, so pip install -U --force-reinstall  tensorflow-datasets and then now it seems to work.