PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

fix: Use Pydantic exclude_none=True to omit null namespace fields from catalog serialization

Open aaronsteers opened this issue 1 month ago • 8 comments

Fix namespace null handling for low-code connectors with S3-data-lake destination

Summary

Fixes PyAirbyte namespace handling issue where low-code connectors (like source-tiktok-marketing and source-snapchat-marketing) fail when writing to destination-s3-data-lake with the error "streams.0.stream.namespace: Null value is not allowed".

The fix adds exclude_none=True to catalog serialization in both destinations/base.py and sources/base.py. This makes Pydantic omit null fields entirely from JSON output instead of serializing them as explicit nulls, which the destination CDK can handle.

Root cause: When PyAirbyte serializes ConfiguredAirbyteCatalog to JSON for passing to connectors, fields with None values (like namespace) were being serialized as explicit JSON null values ("namespace": null). The destination CDK validation rejects explicit null values but accepts omitted fields.

Prior work: This exact fix was implemented in commit 4c5a6d2 by AJ Steers but was somehow lost in subsequent changes.

Review & Testing Checklist for Human

This is a moderate risk change that requires end-to-end testing:

  • [ ] Test with actual low-code connectors: Verify source-tiktok-marketing or source-snapchat-marketing can successfully write to destination-s3-data-lake without the "Null value is not allowed" error
  • [ ] Verify JSON serialization behavior: Inspect the actual JSON catalog passed to connectors to confirm null namespace fields are omitted (not serialized as "namespace": null)
  • [ ] Regression testing: Test other connector types (non-low-code) writing to various destinations to ensure no new issues are introduced
  • [ ] CI verification: Ensure all tests pass, especially any integration tests that involve catalog serialization

Notes

  • Testing limitation: I was unable to test this fix end-to-end with actual connectors due to lack of credentials
  • Targeted change: Only 2 lines modified, adding one parameter to existing function calls
  • Reference: Slack thread
  • Requested by: @aaronsteers
  • Devin session: https://app.devin.ai/sessions/6f299c861622435da6fbd89f3aeba145

Summary by CodeRabbit

  • Bug Fixes
    • Catalog JSON output now omits null fields, producing cleaner and smaller files. This improves compatibility with tools that mis-handle null values and reduces noise in diffs and logs. Applies to both emitted catalogs during reads and temporary files generated by destinations. Existing behavior and error handling remain unchanged.

aaronsteers avatar Oct 07 '25 21:10 aaronsteers