PyAirbyte
PyAirbyte copied to clipboard
fix: Use Pydantic exclude_none=True to omit null namespace fields from catalog serialization
Fix namespace null handling for low-code connectors with S3-data-lake destination
Summary
Fixes PyAirbyte namespace handling issue where low-code connectors (like source-tiktok-marketing and source-snapchat-marketing) fail when writing to destination-s3-data-lake with the error "streams.0.stream.namespace: Null value is not allowed".
The fix adds exclude_none=True to catalog serialization in both destinations/base.py and sources/base.py. This makes Pydantic omit null fields entirely from JSON output instead of serializing them as explicit nulls, which the destination CDK can handle.
Root cause: When PyAirbyte serializes ConfiguredAirbyteCatalog to JSON for passing to connectors, fields with None values (like namespace) were being serialized as explicit JSON null values ("namespace": null). The destination CDK validation rejects explicit null values but accepts omitted fields.
Prior work: This exact fix was implemented in commit 4c5a6d2 by AJ Steers but was somehow lost in subsequent changes.
Review & Testing Checklist for Human
This is a moderate risk change that requires end-to-end testing:
- [ ] Test with actual low-code connectors: Verify source-tiktok-marketing or source-snapchat-marketing can successfully write to destination-s3-data-lake without the "Null value is not allowed" error
- [ ] Verify JSON serialization behavior: Inspect the actual JSON catalog passed to connectors to confirm null namespace fields are omitted (not serialized as
"namespace": null) - [ ] Regression testing: Test other connector types (non-low-code) writing to various destinations to ensure no new issues are introduced
- [ ] CI verification: Ensure all tests pass, especially any integration tests that involve catalog serialization
Notes
- Testing limitation: I was unable to test this fix end-to-end with actual connectors due to lack of credentials
- Targeted change: Only 2 lines modified, adding one parameter to existing function calls
- Reference: Slack thread
- Requested by: @aaronsteers
- Devin session: https://app.devin.ai/sessions/6f299c861622435da6fbd89f3aeba145
Summary by CodeRabbit
- Bug Fixes
- Catalog JSON output now omits null fields, producing cleaner and smaller files. This improves compatibility with tools that mis-handle null values and reduces noise in diffs and logs. Applies to both emitted catalogs during reads and temporary files generated by destinations. Existing behavior and error handling remain unchanged.