datasets icon indicating copy to clipboard operation
datasets copied to clipboard

mocking.mock_data broken by https://github.com/tensorflow/datasets/commit/e8d8966e888667091875fd885ab0803a6dc5a383

Open ruomingp opened this issue 2 years ago • 2 comments

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description https://github.com/tensorflow/datasets/commit/e8d8966e888667091875fd885ab0803a6dc5a383 broken mock_data().

Environment information

  • Operating System: Linux

  • Python version: 3.7.7

  • tensorflow-datasets/tfds-nightly version: 4.5.2.dev202203220044

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?

Reproduction instructions

  with mock_data(num_examples=40):
    builder = tfds.builder("imagenet2012", "...")
    ... 

Link to logs

input_tfds.py:74: in _build_dataset
    builder = tfds.builder(cfg.dataset_name, data_dir=cfg.data_dir)
/miniconda/envs/py377/lib/python3.7/site-packages/tensorflow_datasets/core/load.py:149: in builder
    community.community_register.has_namespace(name.namespace)):
/miniconda/envs/py377/lib/python3.7/site-packages/tensorflow_datasets/core/community/registry.py:124: in has_namespace
    return namespace in self.registers_per_namespace
/miniconda/envs/py377/lib/python3.7/site-packages/tensorflow_datasets/core/community/registry.py:121: in registers_per_namespace
    return self.namespace_config.registers_per_namespace()
/miniconda/envs/py377/lib/python3.7/site-packages/tensorflow_datasets/core/community/registry.py:93: in registers_per_namespace
    config = toml.loads(self.config_path.read_text())
/miniconda/envs/py377/lib/python3.7/site-packages/etils/epath/abstract_path.py:141: in read_text
    return f.read()
/miniconda/envs/py377/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py:114: in read
    self._preread_check()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 self = <tensorflow.python.platform.gfile.GFile object at 0x7f4e18542450>
     def _preread_check(self):
      if not self._read_buf:
        if not self._read_check_passed:
          raise errors.PermissionDeniedError(None, None,
                                             "File isn't open for reading")
        self._read_buf = _pywrap_file_io.BufferedInputStream(
>           compat.path_to_str(self.__name), 1024 * 512)
E       tensorflow.python.framework.errors_impl.NotFoundError: /miniconda/envs/py377/lib/python3.7/site-packages/tensorflow_datasets/community-datasets.toml; No such file or directory
 /miniconda/envs/py377/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py:77: NotFoundError

Expected behavior tfds does not try to read community-datasets.toml.

Additional context Add any other context about the problem here.

ruomingp avatar Mar 22 '22 20:03 ruomingp

Similar issues with load and list_builders methods running on colab from the tfds-nightly package. Output for tfds.list_builders() is below:



[<ipython-input-5-89a978348cb8>](https://localhost:8080/#) in <module>()
----> 1 tfds.list_builders()

6 frames

[/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in list_builders(with_community_datasets)
     64   if with_community_datasets:
     65     if visibility.DatasetType.COMMUNITY_PUBLIC.is_available():
---> 66       datasets += community.community_register.list_builders()
     67   return datasets
     68 

[/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/community/registry.py](https://localhost:8080/#) in list_builders(self)
    129   def list_builders(self) -> List[str]:
    130     builders = []
--> 131     for registers in self.registers_per_namespace.values():
    132       for register in registers:
    133         builders.extend(register.list_builders())

[/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/community/registry.py](https://localhost:8080/#) in registers_per_namespace(self)
    119   def registers_per_namespace(
    120       self) -> Mapping[str, List[register_base.BaseRegister]]:
--> 121     return self.namespace_config.registers_per_namespace()
    122 
    123   def has_namespace(self, namespace: str) -> bool:

[/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/community/registry.py](https://localhost:8080/#) in registers_per_namespace(self)
     91       RuntimeError: when the config contains errors.
     92     """
---> 93     config = toml.loads(self.config_path.read_text())
     94     registers_per_namespace = {}
     95     for namespace, path_or_paths in config['Namespaces'].items():

[/usr/local/lib/python3.7/dist-packages/etils/epath/abstract_path.py](https://localhost:8080/#) in read_text(self, encoding)
    139     """Reads contents of self as bytes."""
    140     with self.open('r', encoding=encoding) as f:
--> 141       return f.read()
    142 
    143   # ====== Write methods ======

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py](https://localhost:8080/#) in read(self, n)
    112       string if in string (regular) mode.
    113     """
--> 114     self._preread_check()
    115     if n == -1:
    116       length = self.size() - self.tell()

[/usr/local/lib/python3.7/dist-packages/tensorflow/python/lib/io/file_io.py](https://localhost:8080/#) in _preread_check(self)
     75                                            "File isn't open for reading")
     76       self._read_buf = _pywrap_file_io.BufferedInputStream(
---> 77           compat.path_to_str(self.__name), 1024 * 512)
     78 
     79   def _prewrite_check(self):

NotFoundError: /usr/local/lib/python3.7/dist-packages/tensorflow_datasets/community-datasets.toml; No such file or directory```

texasfight avatar Mar 22 '22 21:03 texasfight

Hi! Thanks for reporting this. This should now be fixed (that file wasn't included in the package, so we added it). Could you retry? Kind regards, Tom

tomvdw avatar Mar 24 '22 12:03 tomvdw