materialize
materialize copied to clipboard
[Epic] Use Kafka Formats in Platform
Initiative and Theme
Materialize is Friendly; Materialize works with your existing pipelines
Problem
Some users do not use what we are supporting in M1 (FORMAT BYTES
). We need to bring back the formats from pre-Platform to make our Kafka sources useable for more folks. Order of Priority:
- Avro (P1)
- JSON (any more work to do here?) (P1)
- Text (P1)
- CSV (P2)
- Protobuf (P3)
Success Criteria
Users can successfully execute a CREATE SOURCE ... FROM KAFKA
statement using the same formats they were able to use pre-Platform.
Tasks
- [x] Test formats (Avro, JSON, Text, CSV, Protobuf)
- [x] Coordinate with DevEx team about existing tests (see also: https://github.com/MaterializeInc/developer-experience/issues/166)
- [x] Document results
- [x] Persist upstream schema from Confluent Schema Registry (maybe)
- [x] Decide whether we want to do this
QA Sign-off
- [x] Make sure all formats are represented in testdrive tests and the Platform Checks framework
Time Horizon
Small
Blockers
None
Time Horizon
6 weeks
I think we're basically getting these "for free"! Check with @elindsey and @petrosagg to be sure, but I don't think there's any additional work to do here.
I think we're basically getting these "for free"!
I'll leave this one to @elindsey as he mentioned to me some bits that need to be done.
Bump on this one! @elindsey or @petrosagg—what remains to be done here?
I picked Eli's brain on Slack. The outstanding work items are:
- Testing that the formats work in Materialize Cloud.
- Determining whether we need to persist the schemas we read from the Confluent Schema Registry, or whether we're comfortable relying on the upstream registry in perpetuity.
It also occurs to me that the FROM SCHEMA FILE
option for Avro/Protobuf formats needs to be removed, because there is no way to upload a schema file in platform. That makes our tests tricky, though...
@nmeagan11 can you coordinate with @bobbyiliev to test the above formats in Materialize Cloud and document (or link the results here)? We need a way to track these with confidence as we move into Previews.
Can we also create or link an issue for (1) determining whether we need to persist schemas or not and also for (2) the question of determining how to deal with FROM SCHEMA FILE
in the future?
@heeringa Updated the original issue description to reflect the open tasks here.
@nmeagan11 can you coordinate with @bobbyiliev to test the above formats in Materialize Cloud and document (or link the results here)? We need a way to track these with confidence as we move into Previews.
@heeringa, please see the linked devex issue.
@benesch, can you confirm the status of FROM SCHEMA FILE
? Was it removed? @uce, @aljoscha, and I weren't sure!
@benesch, can you confirm the status of
FROM SCHEMA FILE
? Was it removed? @uce, @aljoscha, and I weren't sure!
It's gated behind unsafe mode, so we're good for the purposes of GA. It's tech debt we need to clean up at some point though—a bunch of our internal tests still rely on the feature! Details in #13703.
Marking the "Test formats" task as complete after conversation with @bobbyiliev that everything is working as expected.
Marking "Persist upstream schema from Confluent Schema Registry (maybe)" as complete since we decided not to do it (reference).
The remaining items are QA sign-off (cc @philip-stoev) and the tech debt clean up of FROM SCHEMA FILE
.
@nmeagan11 how are you thinking about persisting schemas for the future? Icebox and re-evaluate at every planning cycle based on demand? Something else?
Icebox and re-evaluate at every planning cycle based on demand?
Exactly!
I don't think we'll ever need to persist schema information to support CREATE SOURCE ... FORMAT AVRO
. We added the work item back in the day before the architecture of platform was as fleshed out. After the linked Slack conversation, I'm pretty convinced what we're doing is safe.
There is however a desire to support a standalone avro_decode
function, which would require a standalone CSR source. That's a bit pie in the sky still, but is tracked in #14133.
I created a separate tracking issue for the FROM SCHEMA FILE
clean up (https://github.com/MaterializeInc/materialize/issues/14911) and I removed https://github.com/MaterializeInc/materialize/issues/12304#issuecomment-1134457022 as a blocker to this epic because it's not a priority for our current milestone. Now that all tasks are complete and we have QA sign-off, I think we're ok to close this epic as complete (@uce to confirm)!