dlt icon indicating copy to clipboard operation
dlt copied to clipboard

Starrocks

Open phaethon opened this issue 8 months ago • 7 comments

Description

This is initial implementation of Starrocks support as a separate destination. It implements Stream Load and INSERT INTO SELECT FROM FILES (if S3 compatible staging is used) methods.

phaethon avatar Apr 12 '25 17:04 phaethon

Deploy Preview for dlt-hub-docs ready!

Name Link
Latest commit 2e5e4074eee6d24b2fcbfa3a5e4571c20624bf5c
Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/68b6efaaee22f000085c333a
Deploy Preview https://deploy-preview-2518--dlt-hub-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Apr 12 '25 17:04 netlify[bot]

@phaethon thanks for this! code looks pretty good, I assume you run it in productions somewhere? I have a few questions

  1. We are not familiar with Starrocks. is there OSS version of it? Or any other way to run tests against?
  2. We can enable our standard tests on this. There will be a few issues for sure to fix. Do you still have time to work on it?
  3. Similar question regarding docs. Our docs are pretty standardized and if you provide a basic version we can improve them ourselves.
  4. There's support for s3. Are other buckets like azure blob storage supported? thanks again!

rudolfix avatar Apr 19 '25 06:04 rudolfix

  1. Starrocks is a fork from Apache Doris. Yes, it has OSS version. It normally needs at least 2 different types of nodes - frontend and backend. This image includes both nodes in all in one container: https://hub.docker.com/r/starrocks/allin1-ubuntu
  2. Yes, I can contribute more code. Having hints where to get samples from other dlt destinations would be helpful. E.g. Starrocks has different types of tables (primary key, duplicate key, aggregate key) - it is currently not clear for me what kind of API would naturally fit dlt to be able to specify table types and its parameters.
  3. I will try to come up with at least basic tutorial.
  4. Starrocks supports Azure, GCS, S3, and S3 compatible storage. My use case required S3 compatible implementation, others can be added.

phaethon avatar Apr 23 '25 17:04 phaethon

@phaethon we went through several discussion if we can take this destination into our core and we decided we do not have resources to do that. first we'd need to enable it for all the standard tests which is pretty big amount of work and then we'll need to maintain those tests and run it with every PR. we'd also need to create documentation.

what we surely can do is to add your repo to our docs along other destinations and provide instructions how to use it. it would be clearly marked as community provided and we OFC can attribute you.

lmk. if you want to do that.

rudolfix avatar Jun 05 '25 18:06 rudolfix

Yes, I would like this to be linked from official docs if you see it as best way to provide for potential users. I don't have any commercial interest in this, but sooner or later there might be someone else with whom to share maintenance.

This raises a couple of practical questions from user perspective. Do I understand correctly that you would recommend installing fork from my repo instead of official dlt package? To install using pip user would need to point to github repo or would it make sense to have a package dlt-starrocks or similar? To create and update documentation I would need to do PR for official repo?

phaethon avatar Jun 14 '25 13:06 phaethon

I want to participate and work through this PR. @rudolfix Please show me the point to start with.

Gunnnn avatar Jul 30 '25 15:07 Gunnnn

@Gunnnn if you have ideas for contribution or Starrocks specific issues, you can add them to fork https://github.com/phaethon/dlt (starrocks branch) If there will be traction on the fork repo, it will be more likely to get integrated in main repo with time. And feel free to ask questions how to use it, based on which we can create first documentation pieces

phaethon avatar Aug 08 '25 08:08 phaethon

hey @phaethon ! we do a cleanup of all community destinations that we are not merging in the core library. they'll get a docs page where we link to your repo. the proposal for the page is here: https://github.com/dlt-hub/dlt/pull/3326

pls tell me

  • if you still want us to link to your fork
  • if you want to change anything in there (you can also do a PR or request changes to comments)

we'll close this PR soon

rudolfix avatar Nov 17 '25 15:11 rudolfix

Yes, please, do this link. I have no immediate comments for the proposed text. I suppose I can do a PR at a later stage, too, and if it is just a text update, it shouldn't be hard for anyone involved. Meanwhile, I can confirm that I am using this myself for last half a year, and for what it does, it works well for my own needs. Going to update it soon with latest dlt release.

phaethon avatar Nov 17 '25 21:11 phaethon