albert icon indicating copy to clipboard operation
albert copied to clipboard

When fine-tuning ALBERT on SQUAD 1.1 - TypeError: Expected binary or unicode string, got None

Open vendyv opened this issue 4 years ago • 13 comments

I've tried to run the 'run_squad_v1' script exactly as per mentioned but experienced TypeError: Expected binary or unicode string, got None

vendyv avatar Jan 09 '20 14:01 vendyv

Can you give the whole command, so that we can attempt to reproduce it? Also, if you could give your TF and PY version numbers.

0x0539 avatar Jan 09 '20 18:01 0x0539

Here is the command: python3 -m run_squad_v1 --albert_config_file='/home/vendy/Desktop/ALBERT-master/albert_base/albert_config.json' --output_dir='/home/vendy/Desktop/ALBERT-master/tmp' --train_file='/home/vendy/Desktop/ALBERT-master/SQUAD data/train-v1.1.json' --predict_file='/home/vendy/Desktop/ALBERT-master/SQUAD data/dev-v1.1.json' --spm_model_file='/home/vendy/Desktop/ALBERT-master/albert_base/30k-clean.model' --do_lower_case --max_seq_length=384 --doc_stride=128 --max_query_length=64 --do_train=true --do_predict=true --train_batch_size=48 --predict_batch_size=8 --learning_rate=5e-5 --num_train_epochs=2.0 --warmup_proportion=.1 --save_checkpoints_steps=5000 --n_best_size=20 --max_answer_length=30

PY: 3.6 TF: 1.15

vendyv avatar Jan 10 '20 09:01 vendyv

@vendyv I ran into this same issue, then I realised the script expects you to supply the train_feature_file parameter. You can give it a path to a file (*.tfrecord) and if it doesn't exist, it will create one for you.

spark-ming avatar Jan 11 '20 04:01 spark-ming

Here is the command: python3 -m run_squad_v1 --albert_config_file='/home/vendy/Desktop/ALBERT-master/albert_base/albert_config.json' --output_dir='/home/vendy/Desktop/ALBERT-master/tmp' --train_file='/home/vendy/Desktop/ALBERT-master/SQUAD data/train-v1.1.json' --predict_file='/home/vendy/Desktop/ALBERT-master/SQUAD data/dev-v1.1.json' --spm_model_file='/home/vendy/Desktop/ALBERT-master/albert_base/30k-clean.model' --do_lower_case --max_seq_length=384 --doc_stride=128 --max_query_length=64 --do_train=true --do_predict=true --train_batch_size=48 --predict_batch_size=8 --learning_rate=5e-5 --num_train_epochs=2.0 --warmup_proportion=.1 --save_checkpoints_steps=5000 --n_best_size=20 --max_answer_length=30

PY: 3.6 TF: 1.15

hello,i have occured the same error with yours.Did you solve it?

MaybeLL avatar Jan 13 '20 07:01 MaybeLL

@MaybeLL Do you have the train_feature_file parameter defined? It fixed it for me when I added the parameter

spark-ming avatar Jan 13 '20 07:01 spark-ming

@MaybeLL Do you have the train_feature_file parameter defined? It fixed it for me when I added the parameter

i don't know if the train_feature_file is same with the train_file . but actually i only have the train_file parameter for the run_race.py. it looks loke :

--train_file=/home/dy/Project/ALBERT/train_file/train.tfrecord

and here is the comment about it:

flags.DEFINE_string("train_file", None, "path to preprocessed tfrecord file. " "The file will be generated if not exst.")

MaybeLL avatar Jan 13 '20 09:01 MaybeLL

Below is the error I am facing while running the command:


 python -m run_squad_v1 \
>   --albert_config_file=/media/xxxx/NewVolume/ALBERT/albert_base/albert_config.json \
>   --output_dir=/media/xxxx/NewVolume/ALBERT/tmp \
>   --train_file=/media/xxxx/NewVolume/ALBERT/data1/train-v1.1.json \
>   --predict_file=/media/xxxx/NewVolume/ALBERT/data1/dev-v1.1.json \
>   --spm_model_file=/media/xxxx/NewVolume/ALBERT/albert_base/30k-clean.model \
>   --do_lower_case \
>   --max_seq_length=384 \
>   --doc_stride=128 \
>   --max_query_length=64 \
>   --do_train=false \
>   --do_predict=true \
>   --train_batch_size=48 \
>   --predict_batch_size=8 \
>   --learning_rate=5e-5 \
>   --num_train_epochs=2.0 \
>   --warmup_proportion=.1 \
>   --save_checkpoints_steps=5000 \
>   --n_best_size=20 \
>   --max_answer_length=30
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0113 15:12:16.637617 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

INFO:tensorflow:loading sentence piece model
I0113 15:12:16.637814 140307062036288 tokenization.py:240] loading sentence piece model
WARNING:tensorflow:Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f9b633440d0>) includes params argument, but params are not passed to Estimator.
W0113 15:12:17.200998 140307062036288 estimator.py:1994] Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f9b633440d0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9b66fd70b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0113 15:12:17.201757 140307062036288 estimator.py:212] Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9b66fd70b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0113 15:12:17.202082 140307062036288 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0113 15:12:17.202302 140307062036288 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

W0113 15:12:17.202427 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0113 15:12:17.317886 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 478, in <module>
    tf.compat.v1.app.run()
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 309, in main
    if (tf.gfile.Exists(FLAGS.predict_feature_file) and tf.gfile.Exists(
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 262, in file_exists
    return file_exists_v2(filename)
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/util/compat.py", line 71, in as_bytes
    (bytes_or_text,))
TypeError: Expected binary or unicode string, got None

aravindchaluvadi avatar Jan 13 '20 09:01 aravindchaluvadi

@MaybeLL Do you have the train_feature_file parameter defined? It fixed it for me when I added the parameter

i don't know if the train_feature_file is same with the train_file . but actually i only have the train_file parameter for the run_race.py. it looks loke :

--train_file=/home/dy/Project/ALBERT/train_file/train.tfrecord

and here is the comment about it:

flags.DEFINE_string("train_file", None, "path to preprocessed tfrecord file. " "The file will be generated if not exst.")

You still need to supply a path to a tfrecord file for train_feature_file. It is different from feature_file. You can specify a filename that does not exist yet, and it will create one for you.

spark-ming avatar Jan 14 '20 07:01 spark-ming

Below is the error I am facing while running the command:


 python -m run_squad_v1 \
>   --albert_config_file=/media/xxxx/NewVolume/ALBERT/albert_base/albert_config.json \
>   --output_dir=/media/xxxx/NewVolume/ALBERT/tmp \
>   --train_file=/media/xxxx/NewVolume/ALBERT/data1/train-v1.1.json \
>   --predict_file=/media/xxxx/NewVolume/ALBERT/data1/dev-v1.1.json \
>   --spm_model_file=/media/xxxx/NewVolume/ALBERT/albert_base/30k-clean.model \
>   --do_lower_case \
>   --max_seq_length=384 \
>   --doc_stride=128 \
>   --max_query_length=64 \
>   --do_train=false \
>   --do_predict=true \
>   --train_batch_size=48 \
>   --predict_batch_size=8 \
>   --learning_rate=5e-5 \
>   --num_train_epochs=2.0 \
>   --warmup_proportion=.1 \
>   --save_checkpoints_steps=5000 \
>   --n_best_size=20 \
>   --max_answer_length=30
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0113 15:12:16.637617 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

INFO:tensorflow:loading sentence piece model
I0113 15:12:16.637814 140307062036288 tokenization.py:240] loading sentence piece model
WARNING:tensorflow:Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f9b633440d0>) includes params argument, but params are not passed to Estimator.
W0113 15:12:17.200998 140307062036288 estimator.py:1994] Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f9b633440d0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9b66fd70b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0113 15:12:17.201757 140307062036288 estimator.py:212] Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f9b66fd70b8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0113 15:12:17.202082 140307062036288 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0113 15:12:17.202302 140307062036288 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

W0113 15:12:17.202427 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0113 15:12:17.317886 140307062036288 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 478, in <module>
    tf.compat.v1.app.run()
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 309, in main
    if (tf.gfile.Exists(FLAGS.predict_feature_file) and tf.gfile.Exists(
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 262, in file_exists
    return file_exists_v2(filename)
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
  File "/home/xxxx/.local/lib/python3.6/site-packages/tensorflow_core/python/util/compat.py", line 71, in as_bytes
    (bytes_or_text,))
TypeError: Expected binary or unicode string, got None

Try

python -m run_squad_v1 \
   --albert_config_file=/media/xxxx/NewVolume/ALBERT/albert_base/albert_config.json \
   --output_dir=/media/xxxx/NewVolume/ALBERT/tmp \
   --train_file=/media/xxxx/NewVolume/ALBERT/data1/train-v1.1.json \
   --train_feature_file=/media/xxxx/NewVolume/ALBERT/data1/feature_file.tfrecord \
   --predict_file=/media/xxxx/NewVolume/ALBERT/data1/dev-v1.1.json \
   --spm_model_file=/media/xxxx/NewVolume/ALBERT/albert_base/30k-clean.model \
   --do_lower_case \
   --max_seq_length=384 \
   --doc_stride=128 \
   --max_query_length=64 \
   --do_train=false \
   --do_predict=true \
   --train_batch_size=48 \
   --predict_batch_size=8 \
   --learning_rate=5e-5 \
   --num_train_epochs=2.0 \
   --warmup_proportion=.1 \
   --save_checkpoints_steps=5000 \
   --n_best_size=20 \
   --max_answer_length=30

spark-ming avatar Jan 14 '20 07:01 spark-ming

@spark-ming

Yes i have tried the same but still I am facing an unicode error:


python -m run_squad_v1    --albert_config_file=/media/xxxx/NewVolume/ALBERT/albert_base/albert_config.json    --output_dir=/media/xxxx/NewVolume/ALBERT/tmp    --train_file=/media/xxxx/NewVolume/ALBERT/data1/train-v1.1.json    --train_feature_file=/media/xxxx/NewVolume/ALBERT/data1/feature_file.tfrecord    --predict_file=/media/xxxx/NewVolume/ALBERT/data1/dev-v1.1.json    --spm_model_file=/media/xxxx/NewVolume/ALBERT/albert_base/30k-clean.model    --do_lower_case    --max_seq_length=384    --doc_stride=128    --max_query_length=64    --do_train=false    --do_predict=true    --train_batch_size=48    --predict_batch_size=8    --learning_rate=5e-5    --num_train_epochs=2.0    --warmup_proportion=.1    --save_checkpoints_steps=5000    --n_best_size=20    --max_answer_length=30

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0114 16:47:02.894027 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

INFO:tensorflow:loading sentence piece model
I0114 16:47:02.894243 140206005626688 tokenization.py:240] loading sentence piece model
WARNING:tensorflow:Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f83d77c9ae8>) includes params argument, but params are not passed to Estimator.
W0114 16:47:03.487462 140206005626688 estimator.py:1994] Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f83d77c9ae8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f83db463208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0114 16:47:03.488247 140206005626688 estimator.py:212] Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f83db463208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0114 16:47:03.488590 140206005626688 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0114 16:47:03.488810 140206005626688 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

W0114 16:47:03.488927 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0114 16:47:03.604506 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 478, in <module>
    tf.compat.v1.app.run()
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 309, in main
    if (tf.gfile.Exists(FLAGS.predict_feature_file) and tf.gfile.Exists(
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 262, in file_exists
    return file_exists_v2(filename)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/util/compat.py", line 71, in as_bytes
    (bytes_or_text,))
TypeError: Expected binary or unicode string, got None

and all the requirements specified in requirements.txt file has been installed in an environment, is it a problem with the config file or data files?

aravindchaluvadi avatar Jan 14 '20 11:01 aravindchaluvadi

@spark-ming

Yes i have tried the same but still I am facing an unicode error:


python -m run_squad_v1    --albert_config_file=/media/xxxx/NewVolume/ALBERT/albert_base/albert_config.json    --output_dir=/media/xxxx/NewVolume/ALBERT/tmp    --train_file=/media/xxxx/NewVolume/ALBERT/data1/train-v1.1.json    --train_feature_file=/media/xxxx/NewVolume/ALBERT/data1/feature_file.tfrecord    --predict_file=/media/xxxx/NewVolume/ALBERT/data1/dev-v1.1.json    --spm_model_file=/media/xxxx/NewVolume/ALBERT/albert_base/30k-clean.model    --do_lower_case    --max_seq_length=384    --doc_stride=128    --max_query_length=64    --do_train=false    --do_predict=true    --train_batch_size=48    --predict_batch_size=8    --learning_rate=5e-5    --num_train_epochs=2.0    --warmup_proportion=.1    --save_checkpoints_steps=5000    --n_best_size=20    --max_answer_length=30

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0114 16:47:02.894027 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:206: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

INFO:tensorflow:loading sentence piece model
I0114 16:47:02.894243 140206005626688 tokenization.py:240] loading sentence piece model
WARNING:tensorflow:Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f83d77c9ae8>) includes params argument, but params are not passed to Estimator.
W0114 16:47:03.487462 140206005626688 estimator.py:1994] Estimator's model_fn (<function v1_model_fn_builder.<locals>.model_fn at 0x7f83d77c9ae8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f83db463208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I0114 16:47:03.488247 140206005626688 estimator.py:212] Using config: {'_model_dir': '/media/xxxx/NewVolume/ALBERT/tmp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 0, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f83db463208>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I0114 16:47:03.488590 140206005626688 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W0114 16:47:03.488810 140206005626688 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

W0114 16:47:03.488927 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:303: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0114 16:47:03.604506 140206005626688 module_wrapper.py:139] From /media/xxxx/NewVolume/ALBERT/run_squad_v1.py:309: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

Traceback (most recent call last):
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 478, in <module>
    tf.compat.v1.app.run()
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/xxxx/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/media/xxxx/NewVolume/ALBERT/run_squad_v1.py", line 309, in main
    if (tf.gfile.Exists(FLAGS.predict_feature_file) and tf.gfile.Exists(
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 262, in file_exists
    return file_exists_v2(filename)
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/lib/io/file_io.py", line 280, in file_exists_v2
    pywrap_tensorflow.FileExists(compat.as_bytes(path))
  File "/home/xxxx/anaconda3/envs/albert/lib/python3.6/site-packages/tensorflow_core/python/util/compat.py", line 71, in as_bytes
    (bytes_or_text,))
TypeError: Expected binary or unicode string, got None

and all the requirements specified in requirements.txt file has been installed in an environment, is it a problem with the config file or data files?

I am having this issue as well, did you find a solution?

theword avatar Jan 15 '20 22:01 theword

I can't fix this right now, but I believe it is a bug with the default value of --predict_feature_file. It defaults to None, but should default to empty string: https://github.com/google-research/ALBERT/blob/master/run_squad_v1.py#L77

0x0539 avatar Jan 25 '20 18:01 0x0539

Provide name of predict feature file like: --predict_feature_file=, file with this name will get created.

Rachnas avatar Feb 03 '20 12:02 Rachnas