coral
coral copied to clipboard
[Coral-Schema] Disambiguate nested record namespaces with numeric suffixes
What changes are proposed in this pull request, and why are they necessary?
- Fix Avro namespace collisions when a parent schema contains multiple fields that reference nested records with the same name but different original namespaces, by doing the following
- Detect collisions per parent record (same record name, different original namespaces).
- Assign stable numeric suffixes to the reconstructed namespace of colliding records (e.g., “-0”, “-1”, …), keeping non-colliding cases unchanged.
This approach:
- Prevents fully-qualified name collisions.
- Avoids leaking source/original namespaces into generated schemas.
- Minimizes changes: only colliding records get suffixes; everything else remains the same.
Simple example: Input (two fields with the same record name):
{
"type": "record",
"name": "Parent",
"namespace": "com.app",
"fields": [
{ "name": "ctxA", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.foo"}] },
{ "name": "ctxB", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.bar"}] }
]
}
- Before (with collision): both nested records reconstructed to the same namespace (e.g., "com.app.Parent"), and has the same name (eg. "Ctx"), which is invalid.
- After (no collision):
- ctxA non-null type namespace: "com.app.Parent-0", with name "Ctx"
- ctxB non-null type namespace: "com.app.Parent-1", with name "Ctx"
How was this patch tested?
- Added unit tests in SchemaUtilitiesTests:
- Verifies collision handling for nullable union fields with same record name.
- Verifies collision handling for direct nested record fields (non-union) with same record name.
- Asserts distinct namespaces with numeric suffixes (e.g., endsWith("-0") / endsWith("-1")).
- Ran full module tests; all existing tests pass without updating expected .avsc files.
- Verified manually in Spark-shell that the namespace collision was fixed for the offending table
Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?
"com.app.Parent.Ctx0"
"com.app.Parent.Ctx1"
Also, some of the testing done section has company specific internal details, that can be skipped in this public forum.
@aastha25
Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?
"com.app.Parent.Ctx0" "com.app.Parent.Ctx1"
This was a mistake in the description. Ctx is not part of the namespace, as it is the actual name where the collision occurs. I've updated the description to fix