ClickHouse
ClickHouse copied to clipboard
Add new features in schema inference
Changelog category (leave one):
- New Feature
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add new settings to control schema inference from text formats:
-
input_format_try_infer_dates
- try infer dates from strings. -
input_format_try_infer_datetimes
- try infer datetimes from strings. -
input_format_try_infer_integers
- try inferInt64
instead ofFloat64
. -
input_format_json_try_infer_numbers_from_strings
- try infer numbers from json strings in JSON formats.
All these settings are enabled by default.
Examples:
:) desc format(JSONEachRow, '{"date" : "2020-01-01"}') settings input_format_try_infer_dates=1;
┌─name─┬─type───────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ date │ Nullable(Date) │ │ │ │ │ │
└──────┴────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
:) desc format(JSONEachRow, '{"date" : "2020-01-01 19:00:00"}') settings input_format_try_infer_datetimes=1
┌─name─┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ date │ Nullable(DateTime64(9)) │ │ │ │ │ │
└──────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
:) desc format(JSONEachRow, '{"int" : 42}') settings input_format_try_infer_integers=1
┌─name─┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ int │ Nullable(Int64) │ │ │ │ │ │
└──────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
:) desc format(JSONEachRow, '{"int" : "42"}') settings input_format_json_try_infer_numbers_from_strings=1, input_format_try_infer_integers=1
┌─name─┬─type────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ int │ Nullable(Int64) │ │ │ │ │ │
└──────┴─────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/
Maybe we can also try to turn on some settings by default. @alexey-milovidov what do you think?
Yes, let's do it. Let's turn on every setting that is mostly safe to use by default.
Test failures are unrelated.