optimus
optimus copied to clipboard
External table datasource from Google Sheets should autodetect schema when not supplied
Bigquery external table API supports auto schema detection for external table source by a boolean property - ref.
If the target field names are the same as the source fields in the Google Spreadsheet, it should be better not to supply any schema rather than specifying the same column names one by one.
Spec with explicitly stated schema (AutoDetect = False
)
name: project.dataset.table
type: external_table
version: 1
spec:
schema:
- name: column_name
type: STRING
source:
type: google_sheets
uris:
- https://docs.google.com/spreadsheets/d/spreadsheet_id
config:
range: Sheet1!A1:B4
skip_leading_rows: 1
Spec without schema autodetect (AutoDetect = True
)
name: project.dataset.table
type: external_table
version: 1
spec:
source:
type: google_sheets
uris:
- https://docs.google.com/spreadsheets/d/spreadsheet_id
config:
range: Sheet1!A1:B4
skip_leading_rows: 1
This looks neat, so it's like if the user has explicitly provided a schema, we will assume auto-detect is disabled?
Yes it will, user will only need to explicitly provide a schema if there is a column renaming use case or if the spreadsheet do not provide any header row