datachain
datachain copied to clipboard
Remove default value checks
Companion issue of https://github.com/iterative/studio/pull/11236
With making CH accept nullable columns, we no longer need special checks for default values in tests, as they will be None always.
Codecov Report
:x: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 87.71%. Comparing base (86c2cb1) to head (8f01a7a).
:warning: Report is 300 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/datachain/data_storage/db_engine.py | 0.00% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #852 +/- ##
==========================================
- Coverage 87.73% 87.71% -0.02%
==========================================
Files 131 131
Lines 11804 11805 +1
Branches 1601 1601
==========================================
- Hits 10356 10355 -1
- Misses 1045 1047 +2
Partials 403 403
| Flag | Coverage Δ | |
|---|---|---|
| datachain | 87.64% <66.66%> (-0.41%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Deploying datachain-documentation with
Cloudflare Pages
| Latest commit: |
8f01a7a
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://13705299.datachain-documentation.pages.dev |
| Branch Preview URL: | https://ilongin-11161-clickhouse-nul.datachain-documentation.pages.dev |
Can we introduce something like strict mode in schema, similar to pydantic to support non-nullable types?
https://docs.pydantic.dev/latest/concepts/strict_mode/
Can we introduce something like
strictmode in schema, similar topydanticto support non-nullable types?https://docs.pydantic.dev/latest/concepts/strict_mode/
The thing here is actually this is not null vs not null constraint, but making sure we don't rely on Clickhouse defaults when we insert empty value (None).
Regarding nulls, Clickhouse is syntactically compatible with other relational databases but not semantically which means that if we define NOT NULL constraint on a column (default behavior before this PR) it will not reject data that has empty value, instead it will put default value for that column type, e.g int -> 0. This is because CH has emphasized for performance and analytical use cases and checking every single value for constraint would be slow.
Because of this, we currently have 2 different behaviors in SaaS vs local:
- In SQLite we allow nulls everywhere which behaves as expected
- In Clickhouse we don't allow nulls anywhere, but CH behaves in a way to put default values instead
The simplest solution is to make all columns nullable in CH, which is what this (and Studio companion) PR is up to, as having those default values is not correct and expected from user perspective. The only drawback is performance implication as it could slow down queries by double, according to what people are saying.
Your suggestion goes one step further -> adding a new feature to the system for setting NULL constraints in user model fields. This is easy to do in SQLite, but in Clickhouse we need to add special constraints or maybe this setting although it seems like it's global and we need it for specific columns
Regarding the joins, as they were also mentioned multiple times in our calls, we have special global setting join_use_nulls set to 1 which means CH will convert all columns which get empty value in joins (e.g. after using outer join) to nullable (same as Studio companion PR is doing to ALL columns being used in CH).
@ilongin what is the status of this?
@ilongin what is the status of this?
@shcheklein It's ready for merge but I know we needed some more discussions / clarification in one of our meetings so I wrote this last comment (and put a link in slack). I would like to hear another opinion / confirmation before merging it.