automl icon indicating copy to clipboard operation
automl copied to clipboard

Error: Training on ImageNet from scratch

Open Basawaraj1 opened this issue 3 years ago • 2 comments

I am trying to train efficientnetv2 on ImageNet from scratch using the command from https://github.com/google/automl/tree/master/efficientnetv2

I am using the most recent version of the code (as of May 30th).

The command I execute: python main.py --mode=train --model_name=efficientnetv2-s --dataset_cfg=imagenet --model_dir='/home/ar2/Desktop/User/automl/v2a_models' --use_tpu=False

I get the following error: ValueError: Invalid argument to flush(): <tf.Tensor 'create_file_writer/SummaryWriter:0' shape=() dtype=resource>

I am having the following packages installed (using Ubuntu 18.04): packages in environment at /home/ar2/anaconda3/envs/automl1: Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 1.1.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi ca-certificates 2022.6.15 ha878542_0 conda-forge cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 pypi_0 pypi charset-normalizer 2.1.0 pypi_0 pypi cudatoolkit 11.2.2 hbe64b41_10 conda-forge cudnn 8.1.0.77 h90431f1_0 conda-forge cycler 0.11.0 pypi_0 pypi dill 0.3.5.1 pypi_0 pypi etils 0.6.0 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.33.3 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.9.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi googleapis-common-protos 1.56.3 pypi_0 pypi grpcio 1.47.0 pypi_0 pypi h5py 3.7.0 pypi_0 pypi idna 3.3 pypi_0 pypi importlib-metadata 4.12.0 pypi_0 pypi importlib-resources 5.8.0 pypi_0 pypi keras 2.8.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.3 pypi_0 pypi ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge libclang 14.0.1 pypi_0 pypi libffi 3.3 h58526e2_2 conda-forge libgcc-ng 12.1.0 h8d9b700_16 conda-forge libgomp 12.1.0 h8d9b700_16 conda-forge libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge libzlib 1.2.12 h166bdaf_1 conda-forge markdown 3.3.7 pypi_0 pypi matplotlib 3.5.2 pypi_0 pypi ncurses 6.3 h27087fc_1 conda-forge numpy 1.21.6 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1p h166bdaf_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 pyhd8ed1ab_0 conda-forge promise 2.3 pypi_0 pypi protobuf 3.19.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.7.13 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi python_abi 3.7 2_cp37m conda-forge pyyaml 6.0 py37h540881e_4 conda-forge readline 8.1.2 h0f457ee_0 conda-forge requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.8 pypi_0 pypi setuptools 62.6.0 py37h89c1867_0 conda-forge six 1.16.0 pypi_0 pypi sqlite 3.39.0 h4ff8645_0 conda-forge tensorboard 2.8.0 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow 2.8.2 pypi_0 pypi tensorflow-addons 0.17.1 pypi_0 pypi tensorflow-estimator 2.8.0 pypi_0 pypi tensorflow-gpu 2.8.2 pypi_0 pypi tensorflow-io-gcs-filesystem 0.26.0 pypi_0 pypi tensorflow-metadata 1.9.0 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tfds-nightly 4.6.0.dev202207040045 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge toml 0.10.2 pypi_0 pypi tqdm 4.64.0 pypi_0 pypi typeguard 2.13.3 pypi_0 pypi typing-extensions 4.2.0 pypi_0 pypi urllib3 1.26.9 pypi_0 pypi werkzeug 2.1.2 pypi_0 pypi wheel 0.37.1 pyhd8ed1ab_0 conda-forge wrapt 1.14.1 pypi_0 pypi xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h7b6447c_0 anaconda zipp 3.8.0 pypi_0 pypi zlib 1.2.12 h166bdaf_1 conda-forge

Additional output on the console before the error:

INFO:tensorflow:Done calling model_fn. I0704 11:34:12.721288 140345024496576 estimator.py:1175] Done calling model_fn. INFO:tensorflow:Create CheckpointSaverHook. I0704 11:34:12.722298 140345024496576 basic_session_run_hooks.py:558] Create CheckpointSaverHook. INFO:tensorflow:training_loop marked as finished I0704 11:34:15.935363 140345024496576 error_handling.py:115] training_loop marked as finished WARNING:tensorflow:Reraising captured error W0704 11:34:15.935580 140345024496576 error_handling.py:149] Reraising captured error Traceback (most recent call last): File "main.py", line 503, in app.run(main) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "main.py", line 496, in main input_fn=ds_lab_cls.input_fn, max_steps=max_steps, hooks=hooks) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3102, in train rendezvous.raise_errors() File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors six.reraise(typ, value, traceback) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/six.py", line 719, in reraise raise value File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3097, in train saving_listeners=saving_listeners) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 360, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1186, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_default saving_listeners) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1525, in _train_with_estimator_spec save_graph_def=self._config.checkpoint_save_graph_def) as mon_sess: File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 612, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 1058, in init stop_grace_period_secs=stop_grace_period_secs) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow/python/training/monitored_session.py", line 745, in init h.begin() File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2335, in begin self._finalize_ops.append(tf.compat.v2.summary.flush(writer=op.inputs[0])) File "/home/ar2/anaconda3/envs/automl1/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 1119, in flush raise ValueError("Invalid argument to flush(): %r" % (writer,)) ValueError: Invalid argument to flush(): <tf.Tensor 'create_file_writer/SummaryWriter:0' shape=() dtype=resource>

Basawaraj1 avatar Jul 04 '22 06:07 Basawaraj1

same error

lbf4616 avatar Aug 23 '22 02:08 lbf4616

same error

try to use main_tf2.py

lbf4616 avatar Aug 23 '22 02:08 lbf4616