BigQuery-Python icon indicating copy to clipboard operation
BigQuery-Python copied to clipboard

"Please look into the error stream for more details."

Open ampkeegan opened this issue 7 years ago • 5 comments

I've been working on loading large data sets in to bigq from a csv in GCS. I've got it working fine for some tables, but for others I get the following error:

bigquery.errors.JobExecutingException: Reason:invalid. Message:Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.

I have a feeling it's schema related, but I can't tell. How can I 'look in to the error stream' to see details? insertErrors isn't in my job object after it fails, I don't see any type of error message other than printing out the exception.

    job = bqClient.client.import_data_from_uris( [gsFile], dataset, table, schema=None)
    try:
        job_id, _results = bqClient.client.wait_for_job(job)
        print("Job ID: " + str(job_id))
        print("Results: " + str(_results))
    except Exception as e:
        print(str(e))
        print(str(job))

My code works great for some tables but not others, so I'm trying to find out whats wrong with one particular table.

ampkeegan avatar Oct 05 '18 17:10 ampkeegan

This sounds like a similar issue. There should be an errors property on the job.

tylertreat avatar Oct 05 '18 17:10 tylertreat

I'm not seeing errors or insertErrors on my job.

I have another piece of code which does have errors on occasion:

job = bqClient.client.push_rows(
            to_bq_jobs[path]['dataset'],
            to_bq_jobs[path]['table'],
            rowsToAdd[rowCount:topCount],
            #insert_id_key=to_bq_jobs[path]['id_to_match']
            )
        #print(str(job))
        if 'insertErrors' in job:

And that lets me print out the errors. However on this job, there isn't an error or insertError key in the job dict.

The status dict has 'state': "running", there isn't any error listed.

I tried changing to job_id, _results = bqClient.client.wait_for_job(job), which still throws the same error and _results doesn't have anything.

ampkeegan avatar Oct 08 '18 18:10 ampkeegan

Can you dump the contents of the dict that gets returned?

tylertreat avatar Oct 08 '18 18:10 tylertreat

I've been testing with JSON newlimited import which also fails, but it also throws a similar error.

Reason:invalid. Message:Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
{
    "configuration": {
        "jobType": "LOAD",
        "load": {
            "destinationTable": {
                "datasetId": "redacted",
                "projectId": "redacted",
                "tableId": "stages_copy"
            },
            "schema": {
                "fields": [
                    {
                        "mode": "NULLABLE",
                        "name": "Stage_id",
                        "type": "INTEGER"
                    },
                    {
                        "mode": "NULLABLE",
                        "name": "Stage_Order",
                        "type": "INTEGER"
                    },
                    {
                        "mode": "NULLABLE",
                        "name": "Stage_Name",
                        "type": "STRING"
                    },
                    {
                        "mode": "NULLABLE",
                        "name": "Stage_Pipeline_id",
                        "type": "INTEGER"
                    }
                ]
            },
            "sourceFormat": "NEWLINE_DELIMITED_JSON",
            "sourceUris": [
                "gs://redacted/stages.json"
            ]
        }
    },
    "etag": "\"redacted/29pKp--d60WMuqds86QyFCCo47Q\"",
    "id": "redacted",
    "jobReference": {
        "jobId": "redacted",
        "location": "US",
        "projectId": "redacted"
    },
    "kind": "bigquery#job",
    "selfLink": "https://www.googleapis.com/bigquery/v2/projects/redacted?location=US",
    "statistics": {
        "creationTime": "1539023656058",
        "startTime": "1539023656543"
    },
    "status": {
        "state": "RUNNING"
    },
    "user_email": "redacted"
}

`

ampkeegan avatar Oct 08 '18 19:10 ampkeegan

Our workaround is to surround all our job.result() calls with a try/except that prints out job.errors, but it would be really nice if the errors were just printed out so that we didn't have to do that!

bencaine1 avatar Oct 24 '18 14:10 bencaine1