python-bigquery-pandas icon indicating copy to clipboard operation
python-bigquery-pandas copied to clipboard

to_gbq() failed to stream record type to BigQuery

Open LiChangNY opened this issue 6 years ago • 5 comments

Looks like there's some issue when streaming RECORD type data to BigQuery. Here're two examples how you can reproduce this issue. Any help is appreciated!

Without specifying table_schema arg, no errors but loads column b as STRING type.

df = pd.DataFrame([[0,{'test':1}]], columns = ['a', 'b'])
gbq.to_gbq(df, destination_table='temp.record_type_1', project_id=[PROJECT_NAME])

Adding table_schema would return a 400 error as it treats the RECORD type column as strings.

df = pd.DataFrame([[0,{'test':1}]], columns = ['a', 'b'])
table_schema = [
{"type": "INTEGER", "mode": "NULLABLE", "name": "a"}, 
{"type": "RECORD", "mode":"NULLABLE", "name": "b",
 "fields":[{"type": "INTEGER", "mode":"NULLABLE", "name": "test"}]
}]
gbq.to_gbq(df, destination_table='temp.record_type_2', project_id=[PROJECT_NAME], table_schema=table_schema)

returns

GenericGBQException: Reason: 400 POST https://www.googleapis.com/upload/bigquery/v2/projects/[PROJECT_NAME]/jobs?uploadType=resumable: Provided Schema does not match Table [PROJECT_NAME]:temp.record_type_2. Field b has changed type from RECORD to STRING

Package:

pandas==0.22.0
pandas-gbq==0.3.1

LiChangNY avatar Apr 06 '18 01:04 LiChangNY

Thanks for the report

Representing nested data structures is not pandantic, and this isn't currently supported in writing to GBQ.

One solution is to load them flattened to BQ and then run a BQ query to nest them

We can leave this open for tracking

max-sixty avatar Apr 06 '18 02:04 max-sixty

Thanks @maxim-lian. Running a BQ query might be a workaround for adhoc jobs. But I think I am going to serialize the data in JSON, load it to Google Cloud Storage and then copy over to BQ with the right schema.

LiChangNY avatar Apr 07 '18 14:04 LiChangNY

@LiChangNY yes that's a great solution

max-sixty avatar Apr 07 '18 17:04 max-sixty

hi any chance record fields are supported in pandas_gbq now?

hilahersz avatar Nov 29 '20 12:11 hilahersz

@hilahersz — pandas really isn't the best tool for nested data structures. I'm less involved in this library these days, but I don't think it's implemented yet.

max-sixty avatar Nov 29 '20 22:11 max-sixty

Closing this out as a "Won't Fix".

meredithslota avatar Aug 23 '23 20:08 meredithslota