bigrquery icon indicating copy to clipboard operation
bigrquery copied to clipboard

Can't round-trip nested data

Open acvelozo opened this issue 5 years ago • 3 comments

Hi,

I am trying to upload nested data using bigrquery. I have been able to reproduce succesfully the example provided in issue #256, but when I try with more complicated data (different types of nesting levels using lists and dataframes), I get an error. Here is a reproducible example that captures the type of error that I am getting:

library(bigrquery)

tableToExtract <- bq_table('bigquery-public-data','google_analytics_sample','ga_sessions_20170801')
tableNested <- bq_table_download(tableToExtract,max_results=10)

tableToUpload <- bq_table(bq_test_dataset(),'test_table')
bq_table_upload(tableToUpload,tableNested)

#> Error: Invalid field name "2". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. Table: test_table_ev_I5_pg9JKl26CP3VnsMhvWUkT5BB9uQY5G5ycLA9srMw_source [invalidQuery]

Please note this is not my exact use case, bq_table_copy() is what I would use if I needed to copy a table. My example serves to show the kind of error I am getting when trying to upload nested data, which is what I am trying to achieve.

I hope my question is clear, but please tell me if I should further clarify.

Thank you in advance, Best regards, Alexandre Velozo

acvelozo avatar Apr 11 '19 17:04 acvelozo

Minimal reprex:

library(bigrquery)
bigrquery::bq_auth("[email protected]")

df <- bq_table_download(
  "bigquery-public-data.google_analytics_sample.ga_sessions_20170801", 
  max_results = 10
)

ds <- bq_test_dataset()
bq_table_upload(bq_table(ds, "test_table"), df)
#> Error: Job 'gargle-169921.job_7ordBvUNX7552SnUeH6-1O6vFobc.US' failed
#> x Invalid field name "1". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. Table: test_table_ev_7vtyP4JeicMWTmo7g0LSGqb9NWGoq2RYK07NEhxSJVI_source [invalidQuery]

Created on 2020-09-16 by the reprex package (v0.3.0.9001)

hadley avatar Sep 16 '20 14:09 hadley

Even more minimal reprex:

library(bigrquery)
bigrquery::bq_auth("[email protected]")

df <- bq_table_download(
  "bigquery-public-data.google_analytics_sample.ga_sessions_20170801", 
  max_results = 1
)

ds <- bq_test_dataset()
bq_table_upload(bq_table(ds, "test_table"), df["trafficSource"])
#> Error: Job 'gargle-169921.job_2G2JcarfO3y4Dd7T5Z28LOc7LvUr.US' failed
#> x Invalid field name "1". Fields must contain only letters, numbers, and underscores, start with a letter or underscore, and be at most 128 characters long. Table: test_table_ev_aB94vwdaHbVUB3ImbUCfrEoif_pjD7NimlC9Hn5tBbY_source [invalidQuery]

df2 <- df["trafficSource"]
str(df2)
#> tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ trafficSource:List of 1
#>   ..$ :List of 9
#>   .. ..$ referralPath    : chr NA
#>   .. ..$ campaign        : chr "(not set)"
#>   .. ..$ source          : chr "(direct)"
#>   .. ..$ medium          : chr "(none)"
#>   .. ..$ keyword         : chr NA
#>   .. ..$ adContent       : chr NA
#>   .. ..$ adwordsClickInfo:List of 1
#>   .. .. ..$ :List of 12
#>   .. .. .. ..$ campaignId        : int NA
#>   .. .. .. ..$ adGroupId         : int NA
#>   .. .. .. ..$ creativeId        : int NA
#>   .. .. .. ..$ criteriaId        : int NA
#>   .. .. .. ..$ page              : int NA
#>   .. .. .. ..$ slot              : chr NA
#>   .. .. .. ..$ criteriaParameters: chr "not available in demo dataset"
#>   .. .. .. ..$ gclId             : chr NA
#>   .. .. .. ..$ customerId        : int NA
#>   .. .. .. ..$ adNetworkType     : chr NA
#>   .. .. .. ..$ targetingCriteria :List of 1
#>   .. .. .. .. ..$ :List of 1
#>   .. .. .. .. .. ..$ : NULL
#>   .. .. .. ..$ isVideoAd         : logi NA
#>   .. ..$ isTrueDirect    : logi NA
#>   .. ..$ campaignCode    : chr NA
df2$trafficSource[[1]]$adwordsClickInfo[[1]]$targetingCriteria <- NULL
bq_table_upload(bq_table(ds, "test_table"), df2)
#> Error: Job 'gargle-169921.job_bHASYBcMLC9rGNQS2LmGtnjTMmV4.US' failed
#> x Error while reading data, error message: JSON processing encountered too many errors, giving up. Rows: 1; errors: 1; max bad: 0; error percent: 0 [invalid]
#> x Error while reading data, error message: JSON parsing error in row starting at position 0: Only optional fields can be set to NULL. Field: referralPath; Value: NULL [invalid]

Created on 2020-09-16 by the reprex package (v0.3.0.9001)

hadley avatar Sep 16 '20 14:09 hadley

Would need to do some exploration to figure if this is a problem with the download or upload (or both).

hadley avatar Sep 16 '20 14:09 hadley