conversational-datasets icon indicating copy to clipboard operation
conversational-datasets copied to clipboard

Missing required option: region – in Google Cloud

Open drunkinlove opened this issue 4 years ago • 6 comments

Hello!

I get the following error when trying to execute the create_data.py script in the Google Cloud Shell:

Traceback (most recent call last):
  File "reddit/create_data.py", line 347, in <module>
    run()
  File "reddit/create_data.py", line 285, in run
    p = beam.Pipeline(options=pipeline_options)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 203, in __init__
    'Pipeline has validations errors: \n' + '\n'.join(errors))
ValueError: Pipeline has validations errors:
Missing required option: region.

I'm using the latest version of apache-beam, 2.23.0.

drunkinlove avatar Sep 01 '20 11:09 drunkinlove

Fixed that by installing the necessary dependencies through requirements.txt. This is the error I get now:

user@cloudshell:~/conversational-datasets (reddit-data-288210)$ python reddit/create_data.py \
>   --output_dir ${DATADIR?} \
>   --reddit_table ${PROJECT?}:${DATASET?}.${TABLE?} \
>   --runner DataflowRunner \
>   --temp_location ${DATADIR?}/temp \
>   --staging_location ${DATADIR?}/staging \
>   --project ${PROJECT?} \
>   --dataset_format JSON
********************************************************************************
Python 2 is deprecated. Upgrade to Python 3 as soon as possible.
See https://cloud.google.com/python/docs/python2-sunset
To suppress this warning, create an empty ~/.cloudshell/no-python-warning file.
The command will automatically proceed in  seconds or on any key.
********************************************************************************
WARNING: Logging before flag parsing goes to stderr.
I0902 11:45:25.874641 140704769283904 apiclient.py:464] Starting GCS upload to gs://reddit-data-bucket/reddit/20200902/staging/beamapp-user-0902114525-652460.1599047125.652742/pipeline.pb..
.
I0902 11:45:25.880270 140704769283904 transport.py:157] Attempting refresh to obtain initial access_token
Traceback (most recent call last):
  File "reddit/create_data.py", line 347, in <module>
    run()
  File "reddit/create_data.py", line 341, in run
    result = p.run()
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 390, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 403, in run
    return self.runner.run_pipeline(self)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 364, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/utils/retry.py", line 180, in wrapper
    return fun(*args, **kwargs)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 485, in create_job
    self.create_job_description(job)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 511, in create_job_description
    StringIO(job.proto_pipeline.SerializeToString()))
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 467, in stage_file
    response = self._storage_client.objects.Insert(request, upload=upload)
  File "/home/user/.local/lib/python2.7/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 971, in Insert
    download=download)
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/base_api.py", line 720, in _RunMethod
    http, http_request, **opts)
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 356, in MakeRequest
    max_retry_wait, total_wait_sec))
  File "/home/user/.local/lib/python2.7/site-packages/apitools/base/py/http_wrapper.py", line 304, in HandleExceptionsAndRebuildHttpConnections
    raise retry_args.exc
httplib2.SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)

drunkinlove avatar Sep 02 '20 11:09 drunkinlove

Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".

AntoineSimoulin avatar Jul 02 '21 09:07 AntoineSimoulin

Regarding the region, you can add the flag 'region' in the command prompt. python reddit/create_data.py \

--output_dir ${DATADIR?}
--reddit_table ${PROJECT?}:${DATASET?}.${TABLE?}
--runner DataflowRunner
--temp_location ${DATADIR?}/temp
--staging_location ${DATADIR?}/staging
--project ${PROJECT?}
--dataset_format JSON
--region us-east1

AntoineSimoulin avatar Jul 02 '21 09:07 AntoineSimoulin

Hi, I got the same error and solve it by updating the httplib2 to the latest version. Regarding the requirements, I also updated tensorflow==1.15.0 since version 1.14.0 gives me the following error: "No module named deprecation_wrapper".

Hmmm, did you change anything from the requirements.txt file other than update httplib2 to newest and updating tensorflow to 1.15.0? I did both of those things but now am getting a "No module named module_wrapper" error :(

amorisot avatar Oct 02 '21 00:10 amorisot

I was wondering if there was an update to the "No module named module_wrapper" error. Thanks!

alu13 avatar Feb 09 '22 05:02 alu13

Regarding the region, you can add the flag 'region' in the command prompt. python reddit/create_data.py \

--output_dir ${DATADIR?} --reddit_table PROJECT?:{DATASET?}.${TABLE?} --runner DataflowRunner --temp_location ${DATADIR?}/temp --staging_location ${DATADIR?}/staging --project ${PROJECT?} --dataset_format JSON --region us-east1

hi,

I used your method and I found it can not sign in google and apitools has been deprecated.

If there has other way to download reddit dataset? Thanks @AntoineSimoulin

pygongnlp avatar Dec 19 '22 10:12 pygongnlp