connectors icon indicating copy to clipboard operation
connectors copied to clipboard

Pulsar delta source connector (task1) : pulsar source connector framework and configuration

Open hangc0276 opened this issue 2 years ago • 2 comments

Motivation

Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform. It act as a bridge to connect different systems based on messages.

The Pulsar delta source connector is a Pulsar IO connector for synchronizing data between Delta Lake and Pulsar. It capture data changes from delta lake through DSR and writes data to Pulsar topics.

Subtask: #334

PR1: Basic framework, configuration field and delta record define. PR2: Define deltaReader, which read changes from delta and return parquet row record list PR3: Implement DeltaReaderThread, which will get parquet records from deltaReader and put it into the blocking queue. PR4: Implement source connector checkpoint mechanism PR5. Add Source connector metrics PR6. Add unit tests and integration tests PR7. Add docs

This PR is the first PR for Pulsar delta source connector. It just contains the base code framework and the basic configuration.

This is the design doc. https://docs.google.com/document/d/1J_SNaYW_2uxU3Y5H5klYipZq56prnFR_rIPc-hNIoq4/edit?usp=sharing

hangc0276 avatar Apr 08 '22 08:04 hangc0276

@dennyglee @scottsand-db Please help take a look, thanks a lot.

hangc0276 avatar Apr 08 '22 08:04 hangc0276

@scottsand-db @dennyglee Would you please help take a look if you have time?

hangc0276 avatar May 06 '22 15:05 hangc0276

This repo has been deprecated and the code is moved under connectors module in https://github.com/delta-io/delta repository. Here are the migration steps to recreate this PR in the new repository location.

vkorukanti avatar Jul 11 '23 17:07 vkorukanti