coral icon indicating copy to clipboard operation
coral copied to clipboard

[Coral-Schema] default value lowercasing

Open ruolin59 opened this issue 2 months ago • 0 comments

What changes are proposed in this pull request, and why are they necessary?

This PR fixes a bug in ToLowercaseSchemaVisitor where field names within complex default values were not being lowercased, causing schema validation errors when creating fields with lowercased schemas.

Given an input schema with a record field that has a default value:

{
  "name": "Struct_Field",
  "type": {
    "type": "record",
    "name": "NestedRecord",
    "fields": [
      {"name": "firstName", "type": "string"},
      {"name": "Age", "type": "int"}
    ]
  },
  "default": {
    "firstName": "John",
    "Age": 30
  }
}

Before the fix:

{
  "name": "struct_field",
  "type": {
    "type": "record",
    "name": "nestedrecord",
    "fields": [
      {"name": "firstname", "type": "string"},
      {"name": "age", "type": "int"}
    ]
  },
  "default": {
    "firstName": "John",  // Mismatched casing
    "Age": 30             // Mismatched casing
  }
}

Result: AvroTypeException: Invalid default for field struct_field

After the fix:

{
  "name": "struct_field",
  "type": {
    "type": "record",
    "name": "nestedrecord",
    "fields": [
      {"name": "firstname", "type": "string"},
      {"name": "age", "type": "int"}
    ]
  },
  "default": {
    "firstname": "John",  // Correctly lowercased
    "age": 30             // Correctly lowercased
  }
}

The Solution: Added lowercaseDefaultValue() method that recursively transforms default values based on the schema type:

  • RECORD types: Lowercases field names in record default values
  • MAP types: Lowercases all keys in map default values
  • ARRAY types: Recursively processes each array element
  • Primitives: Returns unchanged

The implementation handles both GenericData.Record and Map-based default value representations

How was this patch tested?

Added a new test testLowercaseSchemaWithComplexDefaultValues() in SchemaUtilitiesTests that verifies the fix handles:

  1. Simple primitive default values (no change needed)
  2. Nested record default values with mixed-case field names
  3. Map default values with mixed-case keys
  4. Array default values containing records with mixed-case field names

The test uses input/expected schema files:

  • testLowercaseSchemaWithDefaultValues-input.avsc - Contains fields with various default values using mixed case
  • testLowercaseSchemaWithDefaultValues-expected.avsc - Contains the fully lowercased expected output

All existing tests continue to pass, confirming backward compatibility.

ruolin59 avatar Nov 06 '25 19:11 ruolin59