data-science-on-gcp
data-science-on-gcp copied to clipboard
chapter 04: df07.py - Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr.1656567385.996847/pipeline.pb.
Hi! i have the next log when i try to run df07.py.
./df07.py --project PROJECT --bucket BUCKETNAME --region us-central1
Correcting timestamps and writing to BigQuery dataset
/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to
ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/<RegionId>/2022-06-29_22_36_30-1790320629162913076?project=<ProjectId>
Traceback (most recent call last):
File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in
Any suggest will be appreciated. Thank you
Looking at the last line, it looks like you forgot to specify the bucket on the input to df07.py
AILED, Error: Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr. 1656567385.996847/pipeline.pb
thanks, Lak
On Wed, Jun 29, 2022, 11:01 PM Arturo Bringas @.***> wrote:
Hi! i have the next log when i try to run df07.py.
./df07.py --project ${PROJECT} --bucket ${BUCKETNAME} --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options willnot be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/ /2022-06-29_22_36_30-1790320629162913076?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://BUCKETNAME/flights/staging/ch04timecorr.1656567385.996847/pipeline.pb.
Any suggest will be appreciated. Thank you
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ4JFDDNAFPLKFEHNJDVRUZ5DANCNFSM52IC5SFA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
i appreciate the quick response. This is the last log:
act_arturo_b@cloudshell:~/data-science-on-gcp/04_streaming/transform (ds-on-gcp-353305)$ ./df07.py --project ds-on-gcp-353305 --bucket ${BUCKETNAME} --region us-central1
Correcting timestamps and writing to BigQuery dataset
/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to
ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/<RegionId>/2022-06-29_23_27_32-11374214288357084698?project=<ProjectId>
Traceback (most recent call last):
File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in
the problem is the same.
any help will be appreciate.
does this bucket exist? Is the bucket in the us-central1 region?
ds-on-gcp-353305-dsongcp
In any case, the pipeline is failing because it is not able to create this file.
Lak
On Wed, Jun 29, 2022 at 11:33 PM Arturo Bringas @.***> wrote:
i appreciate the quick response. This is the last log:
@.***:~/data-science-on-gcp/04_streaming/transform (ds-on-gcp-353305)$ ./df07.py --project ds-on-gcp-353305 --bucket ${BUCKETNAME} --region us-central1 Correcting timestamps and writing to BigQuery dataset /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery.py:2527: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = pcoll.pipeline.options.view_as( /home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/io/gcp/bigquery_file_loads.py:1129: BeamDeprecationWarning: options is deprecated since First stable release. References to .options will not be supported temp_location = p.options.view_as(GoogleCloudOptions).temp_location warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/ /2022-06-29_23_27_32-11374214288357084698?project= Traceback (most recent call last): File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 202, in run(project=args['project'], bucket=args['bucket'], region=args['region']) File "/home/act_arturo_b/data-science-on-gcp/04_streaming/transform/./df07.py", line 177, in run (events File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/pipeline.py", line 598, in exit self.result.wait_until_finish() File "/home/act_arturo_b/.local/lib/python3.9/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1673, in wait_until_finish raise DataflowRuntimeException( apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Unable to open file: gs://ds-on-gcp-353305-dsongcp/flights/staging/ch04timecorr.1656570447.957722/pipeline.pb.
the problem is the same.
any help will be appreciate.
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/151#issuecomment-1170822589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ2H7VBICRQK6B32FTLVRU5TDANCNFSM52IC5SFA . You are receiving this because you commented.Message ID: @.***>