[Bug]: Dataflow - MongoDB-to-BigQuery batch mode failing with filter on data
Related Template(s)
MongoDB-to-BigQuery
Template Version
v2
What happened?
I have function that checks if a field is true or not. If its true then it returns null to skip saving that document into BigQuery.
I have tried doing a return undefined, return "" and i keep getting the same issue which is
com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.
java.lang.IllegalArgumentException: schema can not be null
Below is a code snippet
function deliveries_transform(input_doc) {
var doc = JSON.parse(input_doc)
// Filters
if (doc.has_parent) {
return null;
}
//return after stringifying
return JSON.stringify(doc);
}
I referred to the example stated in this link https://cloud.google.com/dataflow/docs/guides/templates/create-template-udf#filter_events
The job was created using the google console and not via api or sdk.
Relevant log output
[
{
"insertId": "",
"jsonPayload": {
"line": "exec.go:66",
"message": "com.google.cloud.teleport.v2.common.UncaughtExceptionLogger - The template launch failed.\njava.lang.IllegalArgumentException: schema can not be null\n\tat org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)\n\tat org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.withSchema(BigQueryIO.java:2679)\n\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.run(MongoDbToBigQuery.java:154)\n\tat com.google.cloud.teleport.v2.mongodb.templates.MongoDbToBigQuery.main(MongoDbToBigQuery.java:96)\n"
},
"resource": {
"type": "dataflow_step",
"labels": {
"region": "",
"project_id": "",
"step_id": "",
"job_name": "mongodb-to-bigquery-batch",
"job_id": ""
}
},
"timestamp": "2024-02-12T21:45:00.037010Z",
"severity": "ERROR",
"labels": {
"compute.googleapis.com/resource_name": "",
"dataflow.googleapis.com/region": "us-east4",
"dataflow.googleapis.com/job_id": "",
"compute.googleapis.com/resource_id": "",
"compute.googleapis.com/resource_type": "",
"dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
},
"logName": "",
"receiveTimestamp": "2024-02-12T21:45:02.855403339Z",
"errorGroups": [
{
"id": "CPXppsbT8JP4nQE"
}
]
},
{
"insertId": "",
"jsonPayload": {
"message": "Error: Template launch failed: exit status 1",
"line": "launch.go:80"
},
"resource": {
"type": "dataflow_step",
"labels": {
"job_name": "mongodb-to-bigquery-batch",
"job_id": "",
"step_id": "",
"project_id": "",
"region": ""
}
},
"timestamp": "",
"severity": "ERROR",
"labels": {
"dataflow.googleapis.com/region": "",
"dataflow.googleapis.com/job_id": "",
"compute.googleapis.com/resource_id": "",
"compute.googleapis.com/resource_type": "",
"compute.googleapis.com/resource_name": "",
"dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
},
"logName": "",
"receiveTimestamp": "2024-02-12T21:45:02.855403339Z"
},
{
"textPayload": "Error occurred in the launcher container: Template launch failed. See console logs.",
"insertId": "xl5y9bd22ed",
"resource": {
"type": "dataflow_step",
"labels": {
"project_id": "",
"job_id": "2024-02-12_13_43_46-15601135711795228441",
"job_name": "mongodb-to-bigquery-batch",
"step_id": "",
"region": ""
}
},
"timestamp": "2024-02-12T21:47:43.432514787Z",
"severity": "ERROR",
"labels": {
"dataflow.googleapis.com/job_id": "2024-02-12_13_43_46-15601135711795228441",
"dataflow.googleapis.com/region": ",
"dataflow.googleapis.com/log_type": "",
"dataflow.googleapis.com/job_name": "mongodb-to-bigquery-batch"
},
"logName": "",
"receiveTimestamp": "2024-02-12T21:47:43.962727013Z"
}
]
Hi,
I'm encountering the same issue. If I use a "return null" statement when I try to skip the document row I obtain the "schema can not be null" error. Did anyone manage to resolve the issue? Many thanks!
Hi,
I'm encountering the same issue. If I use a "return null" statement when I try to skip the document row I obtain the "schema can not be null" error. Did anyone manage to resolve the issue? Many thanks!
Hi @britz89 . I have not found a fix but I found an alternative way to skip it. I pull all the data then use a saved query to run and create a new Table from the import. I have the filter applied in that saved query
So if I understood correctly you are pulling the full collection, storing in a temp table and then in a subsequent step filtering the rows. Correct? My requirement is to avoid a full copy of the collection, so I hope that this issue will be fixed otherwise I will have to find another way. Thanks for your suggestion, btw!
So if I understood correctly you are pulling the full collection, storing in a temp table and then in a subsequent step filtering the rows. Correct? My requirement is to avoid a full copy of the collection, so I hope that this issue will be fixed otherwise I will have to find another way. Thanks for your suggestion, btw!
Yes that is what I am currently doing until it is fixed because I need a solution up. The other alternative I thought about is using a custom batch template and fixing the issue.