databend icon indicating copy to clipboard operation
databend copied to clipboard

Feature: Spilled ArtTree Support For Databend

Open JackTan25 opened this issue 1 year ago • 9 comments

Summary For now in databend, there is no a good way to make sure the data unique constraint. We need to support an Index for storage engine and support feature like primary key.

Feature Target

  • support large data deduplicate check and high performance.
  • support spilled ability

Plan

  • support In-Memory ArtTree Index firstly
  • support spilled ArtTree Index secondly

Enhancement

  • Support Concurrent ArtTree Design.

references:

  1. https://db.in.tum.de/~leis/papers/ART.pdf
  2. "The ART of Practical Synchronization"
  3. DuckDb

JackTan25 avatar Jan 24 '24 19:01 JackTan25

cc @dantengsky I think we need to support it in soon.

JackTan25 avatar Jan 24 '24 19:01 JackTan25

https://duckdb.org/2022/07/27/art-storage.html

sundy-li avatar Jan 25 '24 02:01 sundy-li

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

Dousir9 avatar Jan 25 '24 03:01 Dousir9

well, in fact, snowflake has a unique store (https://www.snowflake.com/en/data-cloud/workloads/unistore/). They are developing it, but we can't get the source code and design details. But we can still find out some metarials like https://www.areto.de/wp-content/uploads/snowflake-unistore-Solution-Brief.pdf. By the way, I choose art-tree index for databend, because we have a good reference in open-source product and there are good metariels for us. However, this is not decided, this issue is just a temporary decision, we need to do more surveys and I'm preparing the ArtTree design details for databend.

JackTan25 avatar Jan 25 '24 07:01 JackTan25

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

JackTan25 avatar Jan 25 '24 07:01 JackTan25

Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.

Dousir9 avatar Jan 25 '24 08:01 Dousir9

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key

Dousir9 avatar Jan 25 '24 08:01 Dousir9

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key

well, I think you can see this https://docs.snowflake.com/en/sql-reference/constraints, they just support it as definition, they don't support it.

JackTan25 avatar Jan 25 '24 16:01 JackTan25

Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.

good question, give a initial judge:

  1. for s3 storage, it does't support update-in-place, the good news is that we can do append-only spilled ArtTree.
  2. Transaction ACID, well, we can treat it as a mutation operation.

This issue is just an initial way to solve our unique key problem, and by importing a new index, we can give optimizer more info to speed query and mutation operations. So I need to do more research and give detailed docs for this design. This maybe take a long time to do.

JackTan25 avatar Jan 25 '24 16:01 JackTan25