python-bigquery-pandas Better error message for incorrect to_gbq() table

I am using pandas with BigQuery tables. Because pandas doesn't support boolean missing values, BigQuery BOOLEAN columns become dataframe object columns. I have to use to_gbq() table_schema to convert force those columns back to BOOLEAN.

My dataframe has 104 columns. table_schema has to include every column (until this issue is fixed).

If something in table_schema is wrong, I get the error:

GenericGBQExceptionTraceback (most recent call last)
<ipython-input-174-5109529cfab8> in <module>()
    117   project_id='bryancrampton-testing',
    118   table_schema=table_schema,
--> 119   if_exists='replace')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in to_gbq(self, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver, table_schema)
   1185             verbose=verbose, reauth=reauth, if_exists=if_exists,
   1186             private_key=private_key, auth_local_webserver=auth_local_webserver,
-> 1187             table_schema=table_schema)
   1188 
   1189     @classmethod

/usr/local/lib/python2.7/dist-packages/pandas/io/gbq.pyc in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver, table_schema)
    117         verbose=verbose, reauth=reauth, if_exists=if_exists,
    118         private_key=private_key, auth_local_webserver=auth_local_webserver,
--> 119         table_schema=table_schema)

/usr/local/lib/python2.7/dist-packages/pandas_gbq/gbq.pyc in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver, table_schema)
    979     connector.load_data(
    980         dataframe, dataset_id, table_id, chunksize=chunksize,
--> 981         schema=table_schema)
    982 
    983 

/usr/local/lib/python2.7/dist-packages/pandas_gbq/gbq.pyc in load_data(self, dataframe, dataset_id, table_id, chunksize, schema)
    565                     ((total_rows - remaining_rows) * 100) / total_rows))
    566         except self.http_error as ex:
--> 567             self.process_http_error(ex)
    568 
    569         logger.info("\n")

/usr/local/lib/python2.7/dist-packages/pandas_gbq/gbq.pyc in process_http_error(ex)
    454         # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
    455 
--> 456         raise GenericGBQException("Reason: {0}".format(ex))
    457 
    458     def run_query(self, query, **kwargs):

The error message doesn't say which part of table_schema is wrong. I have to double-check all 104 columns.

Sep 18 '18 00:09 melissachang

I went to BigQuery UI and looked at error for last job. There's another cryptic message:

Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details. (error code: invalid)
Error while reading data, error message: CSV table references column position 103, but line starting at position:0 contains only 48 columns. (error code: invalid)

It turns out table_schema had 104 columns, while the dataframe being written had 48 columns.

So one suggestion is:

to_gbq() checks if table_schema has same # of columns as df. If not, raise an exception, rather than try to write table_schema to BigQuery. (Of course this suggestion is no longer valid after #218 is fixed)

Sep 18 '18 17:09 melissachang

More detailed error messages would be much appreciated

Sep 23 '21 13:09 matthew-kaye

#218 is fixed, so most of this issue should be resolved. There has also been an effort to improve error messages broadly across the BigQuery API and other tools, so it's not clear if this issue persists in other ways. If you have a new example of a confusing error message, please file a new issue. Thank you!

Nov 17 '23 22:11 meredithslota

python-bigquery-pandas
python-bigquery-pandas copied to clipboard

Better error message for incorrect to_gbq() table_schema

python-bigquery-pandas python-bigquery-pandas copied to clipboard

Better error message for incorrect to_gbq() table_schema

python-bigquery-pandas
python-bigquery-pandas copied to clipboard