optimus icon indicating copy to clipboard operation
optimus copied to clipboard

External table datasource from Google Sheets should autodetect schema when not supplied

Open vianhazman opened this issue 3 years ago • 3 comments

Bigquery external table API supports auto schema detection for external table source by a boolean property - ref.

If the target field names are the same as the source fields in the Google Spreadsheet, it should be better not to supply any schema rather than specifying the same column names one by one.

vianhazman avatar Aug 16 '21 12:08 vianhazman

Spec with explicitly stated schema (AutoDetect = False)

name: project.dataset.table
type: external_table
version: 1
spec:
  schema:
  - name: column_name
    type: STRING
  source: 
    type: google_sheets
    uris:
    - https://docs.google.com/spreadsheets/d/spreadsheet_id
    config:
      range: Sheet1!A1:B4
      skip_leading_rows: 1

Spec without schema autodetect (AutoDetect = True)

name: project.dataset.table
type: external_table
version: 1
spec:
  source: 
    type: google_sheets
    uris:
    - https://docs.google.com/spreadsheets/d/spreadsheet_id
    config:
      range: Sheet1!A1:B4
      skip_leading_rows: 1

vianhazman avatar Aug 16 '21 12:08 vianhazman

This looks neat, so it's like if the user has explicitly provided a schema, we will assume auto-detect is disabled?

kushsharma avatar Aug 25 '21 09:08 kushsharma

Yes it will, user will only need to explicitly provide a schema if there is a column renaming use case or if the spreadsheet do not provide any header row

vianhazman avatar Aug 27 '21 02:08 vianhazman