Feature: Spilled ArtTree Support For Databend
Summary For now in databend, there is no a good way to make sure the data unique constraint. We need to support an Index for storage engine and support feature like primary key.
Feature Target
- support large data deduplicate check and high performance.
- support spilled ability
Plan
- support In-Memory ArtTree Index firstly
- support spilled ArtTree Index secondly
Enhancement
- Support Concurrent ArtTree Design.
references:
- https://db.in.tum.de/~leis/papers/ART.pdf
- "The ART of Practical Synchronization"
- DuckDb
cc @dantengsky I think we need to support it in soon.
https://duckdb.org/2022/07/27/art-storage.html
We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?
well, in fact, snowflake has a unique store (https://www.snowflake.com/en/data-cloud/workloads/unistore/). They are developing it, but we can't get the source code and design details. But we can still find out some metarials like https://www.areto.de/wp-content/uploads/snowflake-unistore-Solution-Brief.pdf. By the way, I choose art-tree index for databend, because we have a good reference in open-source product and there are good metariels for us. However, this is not decided, this issue is just a temporary decision, we need to do more surveys and I'm preparing the ArtTree design details for databend.
We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?
for now, they don't support.
Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.
We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?
for now, they don't support.
Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key
We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?
for now, they don't support.
Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key
well, I think you can see this https://docs.snowflake.com/en/sql-reference/constraints, they just support it as definition, they don't support it.
Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.
good question, give a initial judge:
- for s3 storage, it does't support update-in-place, the good news is that we can do append-only spilled ArtTree.
- Transaction ACID, well, we can treat it as a mutation operation.
This issue is just an initial way to solve our unique key problem, and by importing a new index, we can give optimizer more info to speed query and mutation operations. So I need to do more research and give detailed docs for this design. This maybe take a long time to do.