Python wrappers for source creation
Summary
Adding functions to the Python SDK for source objects creation. This is a fully backward-compatible change. Users can continue to use both thrift-based classes and new Python wrappers.
Why / Goal
The primary motivation is to enable the addition of extra attributes at the source level. Similarly to how it's done in GroupBy and Join: all extra arguments are stored in the customJson attribute in Thrift.
Sources can have all sorts of metadata, ie bootstrap.server for Kafka source, which can be helpful for a streaming job.
Additional benefits:
- Less verbose API before:
my_source = ttypes.Source(
events=ttypes.EventSource(
table=...
)
)
after:
my_source = source.EventSource(
table=...
)
- Improving API consistency: existing Python wrappers (ie,
GroupBy,Join) use Pythonic snake case for parameter names, whereas code generated from Thrift uses camel case (ie,snapshotTableinEntitySource) - Omitting a required attribute will produce a more meaningful error
Test Plan
- [ ] Added Unit Tests
- [ x ] Covered by existing CI
- [ ] Integration tested
Checklist
- [ ] Documentation update
Reviewers
Hey @nikhil-zlai , thanks for the review! There's more use for those extra attributes, than just Kafka host and port. For example, I want to store the Avro JSON schema near the source definition and attach it to the source. Or specify all kinds of Kafka consumer properties.
TopicInfo has limited usage since it makes / and = special symbols, and if I were to add anything encoded with base64 to this topic string, it would simply break.
Updated docs
I want to store the Avro JSON schema near the source definition and attach it to the source. Or specify all kinds of Kafka consumer properties.
I see. That definitely justifies the change.
@hzding621, please take another look
Ping @hzding621