conversational-datasets
conversational-datasets copied to clipboard
Missing required option: region – in Google Cloud
Hello!
I get the following error when trying to execute the create_data.py script in the Google Cloud Shell:
Traceback (most recent call last):
File "reddit/create_data.py", line 347, in <module>
run()
File "reddit/create_data.py", line 285, in run
p = beam.Pipeline(options=pipeline_options)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 203, in __init__
'Pipeline has validations errors: \n' + '\n'.join(errors))
ValueError: Pipeline has validations errors:
Missing required option: region.
I'm using the latest version of apache-beam, 2.23.0.
Fixed that by installing the necessary dependencies through requirements.txt. This is the error I get now:
user@cloudshell:~/conversational-datasets (reddit-data-288210)$ python reddit/create_data.py \
> --output_dir ${DATADIR?} \
> --reddit_table ${PROJECT?}:${DATASET?}.${TABLE?} \
> --runner DataflowRunner \
> --temp_location ${DATADIR?}/temp \
> --staging_location ${DATADIR?}/staging \
> --project ${PROJECT?} \
> --dataset_format JSON
********************************************************************************
Python 2 is deprecated. Upgrade to Python 3 as soon as possible.
See https://cloud.google.com/python/docs/python2-sunset
To suppress this warning, create an empty ~/.cloudshell/no-python-warning file.
The command will automatically proceed in seconds or on any key.
********************************************************************************
WARNING: Logging before flag parsing goes to stderr.
I0902 11:45:25.874641 140704769283904 apiclient.py:464] Starting GCS upload to gs://reddit-data-bucket/reddit/20200902/staging/beamapp-user-0902114525-652460.1599047125.652742/pipeline.pb..
.
I0902 11:45:25.880270 140704769283904 transport.py:157] Attempting refresh to obtain initial access_token
Traceback (most recent call last):
File "reddit/create_data.py", line 347, in <module>
run()
File "reddit/create_data.py", line 341, in run
result = p.run()
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 390, in run
self.to_runner_api(), self.runner, self._options).run(False)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 403, in run
return self.runner.run_pipeline(self)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 364, in run_pipeline
self.dataflow_client.create_job(self.job), self)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/utils/retry.py", line 180, in wrapper
return fun(*args, **kwargs)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 485, in create_job
self.create_job_description(job)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 511, in create_job_description
StringIO(job.proto_pipeline.SerializeToString()))
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 467, in stage_file
response = self._storage_client.objects.Insert(request, upload=upload)
File "/home/user/.local/lib/python2.7/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 971, in Insert
download=download)
File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/base_api.py", line 720, in _RunMethod
http, http_request, **opts)
File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 356, in MakeRequest
max_retry_wait, total_wait_sec))
File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 304, in HandleExceptionsAndRebuildHttpConnections
raise retry_args.exc
httplib2.SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)
Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".
Regarding the region, you can add the flag 'region' in the command prompt. python reddit/create_data.py \
--output_dir ${DATADIR?}
--reddit_table ${PROJECT?}:${DATASET?}.${TABLE?}
--runner DataflowRunner
--temp_location ${DATADIR?}/temp
--staging_location ${DATADIR?}/staging
--project ${PROJECT?}
--dataset_format JSON
--region us-east1
Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".
Hmmm, did you change anything from the requirements.txt file other than update httplib2 to newest and updating tensorflow to 1.15.0? I did both of those things but now am getting a "No module named module_wrapper" error :(
I was wondering if there was an update to the "No module named module_wrapper" error. Thanks!
Regarding the region, you can add the flag 'region' in the command prompt. python reddit/create_data.py \
--output_dir ${DATADIR?} --reddit_table PROJECT?:{DATASET?}.${TABLE?} --runner DataflowRunner --temp_location ${DATADIR?}/temp --staging_location ${DATADIR?}/staging --project ${PROJECT?} --dataset_format JSON --region us-east1
hi,
I used your method and I found it can not sign in google and apitools has been deprecated.
If there has other way to download reddit dataset? Thanks @AntoineSimoulin