fhir_models icon indicating copy to clipboard operation
fhir_models copied to clipboard

FI-202: Refactor Validation

Open radamson opened this issue 5 years ago • 4 comments

FHIR Resource Validation

Overview

The FHIR::Validation module provides FHIR Resource Validation capabilities which are intended to improve maintainability and extensibility.

See FI-202

Background

The current state of validation is complicated and costly to maintain. Currently there are two validation implementations in fhir_models.

  1. FHIR::Model contains an implementation that largely relies on the class METADATA constant and defines valid?, validate.
  2. FHIR::StructureDefinition contains an implementation that relies on the underlying StructureDefintion and defines validate_resource and validates_resource?

They each have their own implementation of how things like cardinality adherence are validated.

Cardinality Implementations

FHIR::Model#validate

unless count >= meta['min'] && count <= meta['max']
  errors[field] << "#{meta['path']}: invalid cardinality. Found #{count} expected #{meta['min']}..#{meta['max']}"
end

FHIR::StructureDefinition#validate_resource

def verify_cardinality(element, nodes)
  # Check the cardinality
  min = element.min
  max = element.max == '*' ? Float::INFINITY : element.max.to_i
  @errors << "#{describe_element(element)} failed cardinality test (#{min}..#{max}) -- found #{nodes.size}" if (nodes.size < min) || (nodes.size > max)
end

Some validation checks are only in one and not the other. For example, the maxLength is validated in the FHIR::StructureDefinition implementation, but not in the FHIR::Model implementation.

Each implementation also returns results in different ways, for example if we were to validate a Patient resource:

patient = FHIR::Patient.new(telecom: [{system: 'foo', value: ['bar' , 'baz'], period: [{start:[1,2]}, {start: '2019'}, {start:[3,4]}]}, {system: 'corge', code: 'qux'}])
patient.validate

Returns

{"telecom"=>
  [{"system"=>["ContactPoint.system: invalid codes [\"foo\"]"],
    "value"=>["ContactPoint.value: invalid cardinality. Found 2 expected 0..1"],
    "period"=>
     ["ContactPoint.period: invalid cardinality. Found 2 expected 0..1",
      {"start"=>
        ["Period.start: invalid cardinality. Found 2 expected 0..1",
         "Period.start: 1 does not match dateTime regex",
         "Period.start: 2 does not match dateTime regex"]},
      {"start"=>
        ["Period.start: invalid cardinality. Found 2 expected 0..1",
         "Period.start: 3 does not match dateTime regex",
         "Period.start: 4 does not match dateTime regex"]}]},
   {"system"=>["ContactPoint.system: invalid codes [\"corge\"]"]}]}

The results are organized into a hash whose structure resembles that of a Patient Resource.

It is not immediately obvious which element in the resource contains the error in a collection of elements (In this example the first and third Period.start have cardinality issues). The arrays of results in the hash are also heterogeneous, containing both strings for errors and hashes for the sub elements. Traversing to a nested error requires requires searching through the array and navigating down multiple subtrees. For example, there are two nested hashes in the array which contain errors for the Period.start elements.

The resource can also be validated against the StructureDefinition directly.

patient_structure_def = FHIR::Definitions.resource_definition('Patient')
patient_structure_def.validate_resource(patient)

This actually returns no errors or warnings, which could be misleading. This validator only the elements in the differential which are:

"Patient",
"Patient.identifier",
"Patient.active",
"Patient.name",
"Patient.telecom",
"Patient.gender",
"Patient.birthDate",
"Patient.deceased[x]",
"Patient.address",
"Patient.maritalStatus",
"Patient.multipleBirth[x]",
"Patient.photo",
"Patient.contact",
"Patient.contact.relationship",
"Patient.contact.name",
"Patient.contact.telecom",
"Patient.contact.address",
"Patient.contact.gender",
"Patient.contact.organization",
"Patient.contact.period",
"Patient.communication",
"Patient.communication.language",
"Patient.communication.preferred",
"Patient.generalPractitioner",
"Patient.managingOrganization",
"Patient.link",
"Patient.link.other",
"Patient.link.type"

Updating the patient with errors in these elements and then validating the patient returns:

patient = FHIR::Patient.new(telecom: [{system: 'foo', value: ['bar' , 'baz'], period: [{start:[1,2]}, {start: '2019'}, {start:[3,4]}]}, {system: 'corge', code: 'qux'}])
patient_structure_def.validate_resource(patient)
["Patient.gender has invalid code 'attack helicopter' from http://hl7.org/fhir/ValueSet/administrative-gender",
 "Patient.gender did not match one of the valid data types: [\"code\"]",
 "Patient.contact: FluentPath expression evaluates to false for Patient invariant rule pat-1: SHALL at least contain a contact's details or a reference to an organization",
 "{\"gender\"=>\"fighter jet\"}",
 "Patient.contact.gender has invalid code 'fighter jet' from http://hl7.org/fhir/ValueSet/administrative-gender",
 "Patient.contact.gender did not match one of the valid data types: [\"code\"]"]

These errors are presented as strings in a flat array, even for nested elements like Patient.contact.gender. . Additionally, some errors are not detected, such as the invalid cardinality on Patient.telecom[0].value, Patient.telecom.period which are detected in the StructureDefinition validation.

Warnings are not directly returned, but can be retrieved from the StructureDefinition.

patient_structure_def.warnings

Refactoring Validation

The FHIR::Validation attempts to unify the validation code from both the FHIR::Model and FHIR::StructureDefintion so that it is easier to maintain and update. The module has two "types" of validators: StructureValidators and ElementValidators.

The StructureValidator contains a reference to a profile which contains a number of ElementDefinitions. The StructureValidator constructs and traverses a hierarchy of thses ElementDefinitions to which it applies an ElementValidator.

The ElementValidator implements a validate method which validates the elements in the resource and returns an array of ValidationResults which indicate the validation type and specific element path where the error occurred (e.g. Patient.telecom[0])

These validation results can then be more easily filtered.

results = patient.validate

# Get cardinality results
results.select {|res| res.validation_type == :cardinality}

# Get all failing tests
results.select {|res| res.result == :fail}

The validation results indicate successful and failing validation results, so that it is more precisely known what is checked so that it is not assumed that a resource "passed" when in reality something was just not checked.

The validate method in FHIR::Model and validate_resource methods in FHIR::StructureDefinition delegate validation to the validation module to return similar validation results. The FHIR::Models validation defaults to using the StructureDefinition of the base resource.

A custom validator can also be constructed to use specific ElementValidators if only certain validations are desired, like cardinality or terminology.

Currently ElementValidators are provided for: cardinality, complex types, fixed values, max string length, and terminology.

Things Missing

This is not expected to be the end all for validation, but the first major refactor which allows updates to be added in a more consistent way and in a single place.

Additional validators for patterns, regular expression matching, contraints and others are needed. Some of these checks were partially implemented in one or more of the existing validators, but left until a more complete and consistent implementation is ready.

radamson avatar Aug 08 '19 21:08 radamson

This PR is just huge, so I apologize if I missed it, but where are we validating the primitives? Is that not included any longer because this does not include the regex validation?

jawalonoski avatar Aug 16 '19 16:08 jawalonoski

I was working on doing regex validation based off the StructureDefinition, but it was a little tricky because of some issues in the spec combined with the fact that we don't handle primitive extensions. It should be added back in, but holding onto this PR and making it even larger seemed unwise.

See: https://chat.fhir.org/#narrow/stream/179166-implementers/topic/extension-regex.20usage

radamson avatar Aug 16 '19 18:08 radamson

Well, we still need to validate that Booleans are Booleans, and Strings are Strings, Dates and DateTimes are correct, and different types of numerics are correct. If you don't do that, then the validation could be crazy off --- regardless of extensions.

jawalonoski avatar Aug 16 '19 18:08 jawalonoski

https://github.com/fhir-crucible/plan_executor/pull/174

radamson avatar Mar 27 '20 12:03 radamson