polygon-etl icon indicating copy to clipboard operation
polygon-etl copied to clipboard

Dataflow documentation incorrect

Open Mythicalian opened this issue 2 years ago • 0 comments

I followed the streaming documentation successfully and went through the dataflow documentation, but found it's both not up to date and doesn't work for me.

#1 for the bigquery command, this isn't in the repo: src/main/resources/errors-schema.json

#2 This appears in the logs, but no tables are ever created in BigQuery: "Executing operation polygonBlocksReadFromPubSub/PubsubUnboundedSource+polygonBlocksReadFromPubSub/MapElements/Map+polygonBlocksConvertToTableRows+polygonBlocksWriteToBigQuery/PrepareWrite/ParDo(Anonymous)+polygonBlocksWriteToBigQuery/StreamingInserts/CreateTables/ParDo(CreateTables)+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/ShardTableWrites+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/TagWithUniqueIds+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/Window.Into()/Window.Assign+polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey/WriteStream"

#3 In the PolygonBlocksWriteToBigQuery stage, there are a bunch of warns, I'm not sure if that's relevant, but no more logs appear after that. I've mostly focused on getting the blocks from my local polygon to bigquery: 2022-07-07T18:55:58.630ZOperation ongoing in step polygonBlocksWriteToBigQuery/StreamingInserts/StreamingWriteTables/StreamingWrite for at least 02h10m00s without outputting or completing in state finish at [email protected]/jdk.internal.misc.Unsafe.park(Native Method) at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at [email protected]/java.util.concurrent.FutureTask.awaitDone(FutureTask.java:447) at [email protected]/java.util.concurrent.FutureTask.get(FutureTask.java:190) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:817) at app//org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.insertAll(BigQueryServicesImpl.java:882) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.flushRows(StreamingWriteFn.java:143) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn.finishBundle(StreamingWriteFn.java:115) at app//org.apache.beam.sdk.io.gcp.bigquery.StreamingWriteFn$DoFnInvoker.invokeFinishBundle(Unknown Source)

I would really like to get this ETL job running properly so any advice would be great

Mythicalian avatar Jul 07 '22 18:07 Mythicalian