vector icon indicating copy to clipboard operation
vector copied to clipboard

Support validating a message with JSON schema

Open blake-mealey opened this issue 1 year ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

We're using vector as a usage event pipeline. We currently define JSON schemas for each of our event types which we can use to validate events at write time. However, we've been considering using Vector as the entrypoint for some new event types (creating events from an S3 bucket which uses a different format). For this case, it would be nice to transform the data into the correct shape, and then have the pipeline validate it against a schema to ensure it's correct.

Attempted Solutions

So far we haven't attempted anything, but if first-class support is not added, I think I will attempt to write a tool that generates a VRL script to validate an event against a JSON schema.

Proposal

  1. A new global configuration which defines where to load JSON schemas from (similar to enrichment_tables)
  2. A new VRL function to validate a value against a JSON schema by name

For example, the global configuration may look like:

schemas:
  my_schema:
    type: json_schema
    file_path: /vector-config/schemas/my_schema.json

And the VRL function usage might look like:

is_valid, err = validate_schema(., 'my_schema')

References

No response

Version

No response

blake-mealey avatar Dec 14 '23 15:12 blake-mealey

This would be extremely helpful. I have the same use case.

Freakin avatar May 17 '24 16:05 Freakin

Instead of a VRL function, we could easily add a new json_schema condition type. That would work for the common use-cases:

  • Conditions can be used in tests. (Most of my test conditions could be replaced with JSON schemas.)
  • Conditions can be used in transforms. In particular, the route transform allows you to handle unmatched events and has out-of-the-box metrics for matched/unmatched events, too.

ghost avatar Oct 09 '24 23:10 ghost

I agree that would be a good solution. I'm picturing the syntax as something like:

my_condition:
  type: json_schema
  # Property path to validate. Optional, defaults to `.`
  path: .nested.property
  # The JSON schema
  schema:
    type: object
    properties:
      alpha:
        type: string

That said, one advantage of supporting this in VRL is that it would significantly reduce the number of transforms needed for a pipeline like mine. With the VRL approach, I could have a single remap transform which checks the event type from the object, then validates it against the appropriate schema. However, with the condition approach, I would need an initial route transform which checks the event type and fans out to individual validation route transforms for each event type.

Using a VRL JSON schema check:

flowchart TD
    source --> verify_all_event_types -->|fail| verify_all_event_types._unmatched
    verify_all_event_types._unmatched --> dlq_sink
    verify_all_event_types --->|pass| valid_sink_1 & valid_sink_2

Using a condition JSON schema check:

flowchart TD
    source --> route_by_event_type
    route_by_event_type -->|event_type is 1| verify_event_type_1 -->|fail| verify_event_type_1._unmatched
    route_by_event_type -->|event_type is 2| verify_event_type_2 -->|fail| verify_event_type_2._unmatched
    route_by_event_type -->|event_type is 3| verify_event_type_3 -->|fail| verify_event_type_3._unmatched
    verify_event_type_1._unmatched --> dlq_sink
    verify_event_type_2._unmatched --> dlq_sink
    verify_event_type_3._unmatched --> dlq_sink
    verify_event_type_1 --->|pass| valid_sink_1 & valid_sink_2
    verify_event_type_2 --->|pass| valid_sink_1 & valid_sink_2
    verify_event_type_3 --->|pass| valid_sink_1 & valid_sink_2

It does sound like it would be easier to implement the condition though. Maybe we could start with that and consider implementing the VRL check later?

blake-mealey avatar Oct 10 '24 14:10 blake-mealey