"Please look into the error stream for more details."
I've been working on loading large data sets in to bigq from a csv in GCS. I've got it working fine for some tables, but for others I get the following error:
bigquery.errors.JobExecutingException: Reason:invalid. Message:Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
I have a feeling it's schema related, but I can't tell. How can I 'look in to the error stream' to see details? insertErrors isn't in my job object after it fails, I don't see any type of error message other than printing out the exception.
job = bqClient.client.import_data_from_uris( [gsFile], dataset, table, schema=None)
try:
job_id, _results = bqClient.client.wait_for_job(job)
print("Job ID: " + str(job_id))
print("Results: " + str(_results))
except Exception as e:
print(str(e))
print(str(job))
My code works great for some tables but not others, so I'm trying to find out whats wrong with one particular table.
I'm not seeing errors or insertErrors on my job.
I have another piece of code which does have errors on occasion:
job = bqClient.client.push_rows(
to_bq_jobs[path]['dataset'],
to_bq_jobs[path]['table'],
rowsToAdd[rowCount:topCount],
#insert_id_key=to_bq_jobs[path]['id_to_match']
)
#print(str(job))
if 'insertErrors' in job:
And that lets me print out the errors. However on this job, there isn't an error or insertError key in the job dict.
The status dict has 'state': "running", there isn't any error listed.
I tried changing to job_id, _results = bqClient.client.wait_for_job(job), which still throws the same error and _results doesn't have anything.
Can you dump the contents of the dict that gets returned?
I've been testing with JSON newlimited import which also fails, but it also throws a similar error.
Reason:invalid. Message:Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
{
"configuration": {
"jobType": "LOAD",
"load": {
"destinationTable": {
"datasetId": "redacted",
"projectId": "redacted",
"tableId": "stages_copy"
},
"schema": {
"fields": [
{
"mode": "NULLABLE",
"name": "Stage_id",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "Stage_Order",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "Stage_Name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Stage_Pipeline_id",
"type": "INTEGER"
}
]
},
"sourceFormat": "NEWLINE_DELIMITED_JSON",
"sourceUris": [
"gs://redacted/stages.json"
]
}
},
"etag": "\"redacted/29pKp--d60WMuqds86QyFCCo47Q\"",
"id": "redacted",
"jobReference": {
"jobId": "redacted",
"location": "US",
"projectId": "redacted"
},
"kind": "bigquery#job",
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/redacted?location=US",
"statistics": {
"creationTime": "1539023656058",
"startTime": "1539023656543"
},
"status": {
"state": "RUNNING"
},
"user_email": "redacted"
}
`
Our workaround is to surround all our job.result() calls with a try/except that prints out job.errors, but it would be really nice if the errors were just printed out so that we didn't have to do that!