datasets icon indicating copy to clipboard operation
datasets copied to clipboard

TFDS hangs when downloading/loading WIT, seems related to apache beam and gooogle.cloud.bigquery_storage_v1

Open AntreasAntoniou opened this issue 1 year ago • 0 comments

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description I am trying to use the WIT dataset and upon using the download=True flag the program seems to hang or otherwise taking very long to run without any information as to what is going on.

The exact code I am using is

import tensorflow_datasets as tfds

subset, subset_info = tfds.load(
            name="wit",
            split="val",
            shuffle_files=False,
            download=True,
            as_supervised=False,
            data_dir=args.source_dataset_dir,
            with_info=True,
        )

Environment information

  • Operating System: Ubuntu 22.04

  • Python version: 3.8.13

  • tf-estimator-nightly 2.10.0.dev2022070408 pypi_0 pypi

  • tf-nightly 2.10.0.dev20220704 pypi_0 pypi

  • tfds-nightly 4.6.0.dev202207080047 pypi_0 pypi

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yes

Reproduction instructions

import tensorflow_datasets as tfds

subset, subset_info = tfds.load(
    name="wit",
    split="val",
    shuffle_files=False,
    download=True,
    as_supervised=False,
    data_dir="/tmp/tfds/data",
    with_info=True,
)

If you share a colab, make sure to update the permissions to share it.

Link to logs https://gist.github.com/AntreasAntoniou/73feea24699c60bc94411ad81c73f8d2

Expected behavior The code should download the required dataset and load it. Instead it downloads it, but then seems to hang. I understand that it states that datasets that use apache beam might take long times to load, but the current behaviour provides no information as to whether something is being loaded or if things are stuck. I let this run for 12 hours on an 8-core AMD CPU, so I was hoping that some feedback would come through, but it never did.

Additional context Add any other context about the problem here.

AntreasAntoniou avatar Jul 08 '22 18:07 AntreasAntoniou