models icon indicating copy to clipboard operation
models copied to clipboard

build_image_dataset.py crashes for delf training

Open avidullu opened this issue 3 years ago • 4 comments

Running instructions from https://github.com/tensorflow/models/blob/master/research/delf/delf/python/training/README.md#prepare-the-data-for-training without any GPUs on a Google Cloud VM encounters an error.

Below is the command with the error python3 build_image_dataset.py --train_csv_path=$LANDMARK_DATA/train/train.csv --train_clean_csv_path=$LANDMARK_DATA/train/train_clean.csv --train_directory=$LANDMARK_DATA/train//// --output_directory=$LANDMARK_DATA/tfrecord/ --num_shards=128 --generate_train_validation_splits --validation_split_size=0.2 --test_csv_path=$LANDMARK_DATA/train/test.csv --test_directory=$LANDMARK_DATA/test//// 2022-01-27 22:07:03.277440: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2022-01-27 22:07:03.277495: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-01-27 22:07:05.252430: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2022-01-27 22:07:05.252502: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-01-27 22:07:05.252525: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (gcsfuse-experiment): /proc/driver/nvidia/version does not exist /home/avidullu/mldata/cvdfoundation/google-landmark/train/train_clean.csv Traceback (most recent call last): File "build_image_dataset.py", line 491, in app.run(main) File "/home/avidullu/.local/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/avidullu/.local/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "build_image_dataset.py", line 485, in main FLAGS.seed) File "build_image_dataset.py", line 439, in _build_train_tfrecord_dataset image_dir) File "build_image_dataset.py", line 144, in _get_clean_train_image_files_and_labels df = pd.read_csv(csv_file) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 811, in init self._engine = self._make_engine(self.engine) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine return mapping[engine](self.f, **self.options) # type: ignore[call-arg] File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in init self._open_handles(src, kwds) File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/parsers/base_parser.py", line 229, in _open_handles errors=kwds.get("encoding_errors", "strict"), File "/home/avidullu/.local/lib/python3.7/site-packages/pandas/io/common.py", line 724, in get_handle newline="", AttributeError: 'GFile' object has no attribute 'readable'

https://github.com/tensorflow/models/blob/a033df775262a1a48420649a216a16b687bc39f6/research/delf/delf/python/training/build_image_dataset.py#L143 seems to be using a binary mode for read. On removing the 'b' from here and L116 the script makes progress.

avidullu avatar Jan 27 '22 22:01 avidullu