Use now + 2mins as the end timestamp for change stream read API if the connector endTimestamp is omitted
V1 change stream can use null end timestamp for the query, however V2 the end timestamp of the query should be NOT NULL, and should be at most 30 mins from the max(now, start_timestamp).
To allow users to still omit the connector endTimestamp field to run the connector forever, but to give a valid endTimestamp when try to query change stream, we set the change stream endTimestamp in this case as now + 2 mins.
This solution works as the Apache beam checkpoints the ReadChangeStreamPartition execution every 5s or 5MB of output data produced. Moreover the change stream query has a hard 1 min deadline.
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers
Checked failed integration test, not relevant to this PR. No need to block the review.
assign set of reviewers
Assigning reviewers:
R: @m-trieu for label java. R: @nielm for label spanner.
Note: If you would like to opt out of this review, comment assign to next reviewer.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
Reminder, please take a look at this pr: @m-trieu @nielm
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:
R: @robertwb for label java. R: @nielm for label spanner.
Available commands:
stop reviewer notifications- opt out of the automated review toolingremind me after tests pass- tag the comment author after tests passwaiting on author- shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
Did we run a pipeline to test this?
Yes I ran a pipeline containing this change in order to run a connector without specifying end timestamp.
However, I don't record any evidence for that. If you have opinion, I can run again and put evidence somewhere maybe here. Please let me know your thoughts, thanks!
Just so I understand things correctly:
- The change stream v2 partition token does not have an explicit end timestamp
- Users however are required to provide an end timestamp to their change stream V2 queries within 30 minutes in the future
- We chose 2 minutes, since it is relatively small and > the Dataflow 5s checkpointing.
Was wondering if you could confirm?
Just so I understand things correctly:
- The change stream v2 partition token does not have an explicit end timestamp
- Users however are required to provide an end timestamp to their change stream V2 queries within 30 minutes in the future
- We chose 2 minutes, since it is relatively small and > the Dataflow 5s checkpointing.
Was wondering if you could confirm?
- V2 change stream required end_timestamp, and it's cannot be omitted.
- However, connector end_timestamp can be skipped. If so, the connector needs to run forever.
- Yes.
Please let me know if more details/explanation is needed. Thanks!
Friendly ping :)
please fix:
Execution failed for task ':sdks:java:io:google-cloud-platform:spotlessJavaCheck'.
> The following files had format violations:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/changestreams/action/QueryChangeStreamAction.java
@@ -299,7 +299,8 @@
??}
??//?Return?(now?+?2?mins)?as?the?end?timestamp?for?reading?change?streams.?This?is?only?used?if
-??//?users?want?to?run?the?connector?forever.?This?approach?works?because?Google?Dataflow?checkpoints
+??//?users?want?to?run?the?connector?forever.?This?approach?works?because?Google?Dataflow
+??//?checkpoints
??//?every?5s?or?5MB?output?provided?and?the?change?stream?query?has?deadline?for?1?min.
??private?Timestamp?getNextReadChangeStreamEndTimestamp()?{
????final?Timestamp?current?=?Timestamp.now();
Run './gradlew :sdks:java:io:google-cloud-platform:spotlessApply' to fix these violations.