PyAirbyte
PyAirbyte copied to clipboard
fix: Allow composite primary key overrides in PyAirbyte
fix: Allow composite primary keys in PyAirbyte
Summary
Fixed a bug in PyAirbyte's primary key validation that incorrectly rejected composite primary keys like ['id', 'category'] with the error "Nested primary keys are not supported. Each PK column should have exactly one node."
The validation logic in catalog_providers.py was checking if each primary key had exactly one column (len(pk_nodes) != 1), which blocked composite keys. The intent was to prevent nested field references like ['data.id'], not composite keys.
Changes made:
- Modified
get_primary_keys()validation inairbyte/shared/catalog_providers.py - Changed from checking column count to checking for dot notation (nested field indicators)
- Composite keys like
['id', 'category']now work correctly - Nested keys like
['data.id']are still properly rejected
Review & Testing Checklist for Human
⚠️ Risk Level: Medium - Core validation logic change affecting all PyAirbyte users
- [ ] Verify validation logic is correct: Confirm that
any("." in node for node in pk_nodes)properly distinguishes between composite keys (['id', 'category']) and nested keys (['data.id']) - [ ] Test with real connectors: Try setting composite primary keys on actual source connectors to ensure no regressions in production workflows
- [ ] Edge case testing: Test with keys containing special characters, numbers, or unusual naming patterns to ensure robustness
- [ ] Integration test coverage: Check if integration tests need updates to cover composite primary key scenarios
Recommended test plan:
- Run the reproduction script with MotherDuck API key to verify the original issue is resolved
- Test composite primary keys with 2-3 different source connectors
- Verify single-column primary keys still work normally
- Test that truly nested keys (with dots) are still rejected appropriately
Diagram
%%{ init : { "theme" : "default" }}%%
graph TB
repro["reproduce_null_pk_issue.py<br/>(test script)"]:::context
source_base["airbyte/sources/base.py<br/>set_primary_keys()"]:::context
catalog_providers["airbyte/shared/catalog_providers.py<br/>get_primary_keys()"]:::major-edit
test_overrides["tests/unit_tests/sources/<br/>test_source_key_overrides.py"]:::context
repro -->|"calls set_primary_keys(['id', 'category'])"| source_base
source_base -->|"stores as _primary_key_overrides"| catalog_providers
catalog_providers -->|"validates primary key format"| test_overrides
catalog_providers -->|"OLD: len(pk_nodes) != 1<br/>NEW: any('.' in node)"| catalog_providers
subgraph Legend
L1[Major Edit]:::major-edit
L3[Context/No Edit]:::context
end
classDef major-edit fill:#90EE90
classDef context fill:#FFFFFF
Notes
This fix resolves the issue reported by AJ Steers where composite primary keys were incorrectly blocked by PyAirbyte's validation. The change is minimal (2 lines) but affects a core validation function used throughout the system.
Key validation change:
- Before:
if len(pk_nodes) != 1:(rejected any multi-column key) - After:
if any("." in node for node in pk_nodes):(only rejects nested field references)
All existing unit tests pass, including the specific composite primary key test that was previously failing. The reproduction script now successfully processes composite primary keys with both DuckDB and Snowflake destinations.
Session details: Requested by AJ Steers (@aaronsteers) in Devin session: https://app.devin.ai/sessions/bc5912d0a95e499dab8e86e329f1face
Summary by CodeRabbit
- Bug Fixes
- Improved validation for primary keys to ensure only top-level fields are accepted, providing clearer error messages when nested fields are used.