pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][pulsar-io] Added support for generic record and raw JSON string schemas to CassandraSink

Open david-streamlio opened this issue 3 years ago • 11 comments

Motivation

The current implementation of the Cassandra Sink connector only supported a single schema type (key, string). This is not useful for production. So I modified the code to be able to support any schema type in Cassandra.

Modifications

Added classes that interrogate the database to determine the schema type at runtime. I also added a framework that will extract the values from the supported incoming schema types (GenericRecord, and String) using the table metadata.

Verifying this change

  • [ x] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change added tests and can be verified as follows:

Added integration tests for testing against a Cassandra database

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes)
  • The public API: (no)
  • The schema: (yes)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: ( no)
  • Anything that affects deployment: (no)

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • [ x] doc-required (Your PR needs to update docs and you will update later)

david-streamlio avatar Jun 22 '22 14:06 david-streamlio

@tspannhw can you review and upvote?

david-streamlio avatar Jul 21 '22 17:07 david-streamlio

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Aug 21 '22 02:08 github-actions[bot]

/pulsarbot run-failure-checks

david-streamlio avatar Oct 10 '22 20:10 david-streamlio

/pulsarbot ready-to-test

david-streamlio avatar Oct 11 '22 16:10 david-streamlio

Codecov Report

Merging #16179 (21cedba) into master (1b5722d) will decrease coverage by 0.04%. The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #16179      +/-   ##
============================================
- Coverage     46.34%   46.29%   -0.05%     
- Complexity    10394    10420      +26     
============================================
  Files           703      703              
  Lines         68838    68858      +20     
  Branches       7379     7383       +4     
============================================
- Hits          31905    31880      -25     
- Misses        33324    33375      +51     
+ Partials       3609     3603       -6     
Flag Coverage Δ
unittests 46.29% <ø> (-0.05%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...java/org/apache/pulsar/proxy/stats/TopicStats.java 58.82% <0.00%> (-41.18%) :arrow_down:
...lsar/broker/loadbalance/impl/ThresholdShedder.java 3.27% <0.00%> (-27.87%) :arrow_down:
.../apache/pulsar/broker/loadbalance/LoadManager.java 61.11% <0.00%> (-16.67%) :arrow_down:
...rg/apache/pulsar/broker/lookup/v1/TopicLookup.java 60.00% <0.00%> (-13.34%) :arrow_down:
...roker/service/persistent/MessageDeduplication.java 43.23% <0.00%> (-10.49%) :arrow_down:
...org/apache/pulsar/broker/loadbalance/LoadData.java 58.33% <0.00%> (-8.34%) :arrow_down:
...he/pulsar/client/impl/PartitionedProducerImpl.java 30.34% <0.00%> (-5.13%) :arrow_down:
.../apache/pulsar/client/impl/BatchMessageIdImpl.java 67.50% <0.00%> (-4.73%) :arrow_down:
...tent/PersistentDispatcherSingleActiveConsumer.java 55.17% <0.00%> (-4.71%) :arrow_down:
...pulsar/broker/service/PulsarCommandSenderImpl.java 73.84% <0.00%> (-4.62%) :arrow_down:
... and 64 more

codecov-commenter avatar Oct 13 '22 16:10 codecov-commenter

/pulsarbot run-failure-checks

david-streamlio avatar Oct 13 '22 19:10 david-streamlio

@eolivelli I have made the requested changes, can you PTAL when you get a chance? Thank!

david-streamlio avatar Oct 24 '22 15:10 david-streamlio

@eolivelli , Can you please take a look at this when you get the chance? Thanks again!

david-streamlio avatar Oct 31 '22 16:10 david-streamlio

@eolivelli Can I please get some feedback on these changes I made in response to your initial feedback? Thanks again for the review, I really appreciated it.

david-streamlio avatar Nov 11 '22 20:11 david-streamlio

/pulsarbot run-failure-checks

david-streamlio avatar Dec 16 '22 17:12 david-streamlio

@tisonkun @eolivelli I would appreciate another review when you have the time.

david-streamlio avatar Jan 13 '23 17:01 david-streamlio