coral icon indicating copy to clipboard operation
coral copied to clipboard

[Coral-Schema] Disambiguate nested record namespaces with numeric suffixes

Open ruolin59 opened this issue 2 months ago • 2 comments

What changes are proposed in this pull request, and why are they necessary?

  • Fix Avro namespace collisions when a parent schema contains multiple fields that reference nested records with the same name but different original namespaces, by doing the following
    • Detect collisions per parent record (same record name, different original namespaces).
    • Assign stable numeric suffixes to the reconstructed namespace of colliding records (e.g., “-0”, “-1”, …), keeping non-colliding cases unchanged.

This approach:

  • Prevents fully-qualified name collisions.
  • Avoids leaking source/original namespaces into generated schemas.
  • Minimizes changes: only colliding records get suffixes; everything else remains the same.

Simple example: Input (two fields with the same record name):

{
  "type": "record",
  "name": "Parent",
  "namespace": "com.app",
  "fields": [
    { "name": "ctxA", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.foo"}] },
    { "name": "ctxB", "type": ["null", {"type":"record","name":"Ctx","namespace":"com.bar"}] }
  ]
}
  • Before (with collision): both nested records reconstructed to the same namespace (e.g., "com.app.Parent"), and has the same name (eg. "Ctx"), which is invalid.
  • After (no collision):
    • ctxA non-null type namespace: "com.app.Parent-0", with name "Ctx"
    • ctxB non-null type namespace: "com.app.Parent-1", with name "Ctx"

How was this patch tested?

  • Added unit tests in SchemaUtilitiesTests:
  • Verifies collision handling for nullable union fields with same record name.
  • Verifies collision handling for direct nested record fields (non-union) with same record name.
  • Asserts distinct namespaces with numeric suffixes (e.g., endsWith("-0") / endsWith("-1")).
  • Ran full module tests; all existing tests pass without updating expected .avsc files.
  • Verified manually in Spark-shell that the namespace collision was fixed for the offending table

ruolin59 avatar Nov 10 '25 22:11 ruolin59

Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?

"com.app.Parent.Ctx0"
"com.app.Parent.Ctx1"

Also, some of the testing done section has company specific internal details, that can be skipped in this public forum.

aastha25 avatar Nov 20 '25 22:11 aastha25

@aastha25

Thanks for the PR, purely based on the PR description, should the namespace unique-fication be at the conflicting layer?

"com.app.Parent.Ctx0"
"com.app.Parent.Ctx1"

This was a mistake in the description. Ctx is not part of the namespace, as it is the actual name where the collision occurs. I've updated the description to fix

ruolin59 avatar Nov 21 '25 19:11 ruolin59