avro icon indicating copy to clipboard operation
avro copied to clipboard

[ruby] Keep all union record errors and tag them accordingly

Open davidgm0 opened this issue 10 months ago • 0 comments

I wasn't able to create an issue in JIRA, since I don't seem to be able to request a user. I am a first time contributor, apologies in case something is not precise enough

What is the purpose of the change

This is a second take on https://github.com/apache/avro/pull/2062 (hopefully better) Improving validation errors for unions of records When no valid union is found, this line selects the first error that it finds. If there would be 3 different possible schemas, there are 3 failures, however only the first one will be considered.

https://github.com/apache/avro/blob/fc2a4e0e5d88755c776b97fa1872e70ffbe0d4bb/lang/ruby/lib/avro/schema_validator.rb#L199

The returned error also does not give any context of the error, only about the invalid fields.

As an example, consider we have 3 types of adresses PersonalAddress, WorkAddress, SecondAddress and we made a typo: We used the field compan instead of company, which belongs to WorkAddress. No type will work for the union. { "compan" => "acme inc.", "some_other_field" => "something else" }

Since PersonalAddress is the first type that fails, the errors of PersonalAddress are the ones that will be returned. Since PersonalAddress has different structure, all fields that aren't the same as in WorkAddress will be shown as errors. The error will be something like

at .address extra field 'compan' - not in schema
at .address extra field 'some_other_field' - not in schema

This gets very confusing because 1) The issue is that no union matched 2) It is not clear because only one of the error is displayed.

I don't know the internals very well, but the existing behaviour also seems flawed because if any of the types is complex, only the complex type errors are listed. I've changed this to list all errors. The error format isn't perfect and I'm happy to accept changes or improve it myself, but I would very much like this to get eventually merged, since the existing validation behaviour for unions is not correct.

Example

This example has the same structure as a real event/schema I'm using. The top level schema is a 'wrapper', and the payload is a 'subschema'. The error is very verbose, but it's necessary given the structure:

Invalid event of type ZooAnonymizedEvent: at .payload expected union of [AnimalAdmittedPayload ('record'), AnimalHabitatCreatedPayload ('record'), ZooPartnershipCreatedPayload ('record'), ZooPartnershipEndedPayload ('record'), AnimalObservationPayload ('record'), AnimalHealthFindingPayload ('record'), ZooInspectionCreatedPayload ('record'), ZooInspectionScheduledPayload ('record'), AnimalTrainingCreatedPayload ('record'), AnimalIncidentPublishedPayload ('record'), AnimalIncidentStateUpdatedPayload ('record'), AnimalIncidentUpdatedPayload ('record'), ZooStatusUpdatedPayload ('record'), ZooActivityUpdatedPayload ('record'), AnimalRecordCreatedPayload ('record'), AnimalRecordDeletedPayload ('record'), AnimalRecordStateUpdatedPayload ('record'), AnimalRecordUpdatedPayload ('record'), ZooSystemIntegrationCreatedPayload ('record'), ZooAnonymizedPayload ('record'), AnimalRescueCancelledRefundPayload ('record'), AnimalRescueCreatedPayload ('record'), AnimalRescueStateUpdatedPayload ('record'), AnimalRescueUpdatedPayload ('record'), ZooKeeperAddedPayload ('record'), ZooKeeperRemovedPayload ('record'), AnimalFeedingScheduledPayload ('record'), ZooKeeperSkillUpdatedPayload ('record')], got record with value {"zoo_id"=>"zoo_ABC123"}

Union type specific errors:   AnimalAdmittedPayload: at .payload.zookeeper_id expected type string, got null; at .payload.animal_id expected type string, got null
  AnimalHabitatCreatedPayload: at .payload.caretaker_id expected type string, got null; at .payload.habitat_name expected type string, got null
  ZooPartnershipCreatedPayload: at .payload.manager_id expected type string, got null; at .payload.partnership_level expected enum with values ["affiliate", "sponsor", "collaborator"], got null; at .payload.partner_zoo_id expected type string, got null; at .payload.created_at expected type long, got null; at .payload extra field 'zoo_id' - not in schema
  ZooPartnershipEndedPayload: at .payload.manager_id expected type string, got null; at .payload.partnership_level expected enum with values ["affiliate", "sponsor", "collaborator"], got null; at .payload.partner_zoo_id expected type string, got null; at .payload.created_at expected type long, got null; at .payload extra field 'zoo_id' - not in schema
(...) 

Verifying this change

This change added tests and can be verified as follows:

  • Modified existing test, existing tests pass

Documentation

  • Does this pull request introduce a new feature? no

davidgm0 avatar Feb 20 '25 16:02 davidgm0