data-prepper
data-prepper copied to clipboard
Support parsing JSON
Is your feature request related to a problem? Please describe.
Data Prepper events may have JSON values inside Event fields. Data Prepper should be able to parse these JSON strings and create fields directly in the Event from the JSON.
Describe the solution you'd like
Provide a JSON parsing processor - parse_json
.
It should be able to parse a JSON string from a field and set the values in the Event object. This processor will automatically support nesting.
Example
Given the following configuration:
processor:
parse_json:
source: my_field
Given this input event:
"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"
The input event is changed to:
"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"
"key1" : "value1"
"key2" : "value2"
Example with Nesting
Given this input event:
"my_field" : "{\"key1\" : \"value1\", \"key2\" : { \"key2child\" : \"innerValue\" }}"
The input event is changed to:
"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"
"key1" : "value1"
"key2" : {
"key2child" : "innerValue"
}
Configurations
source
- the field with JSON
target
- the field to set the values in; by default this is the root object
I want to request supports for below two special use cases in the parse_json
processsor.
- Case 1: Be able to filter based on Json path
For example, here is my original json file:
{"Records":[
{"key1": "value1", ....},
{"key1": "value2", ....},
...
]}
The expected result after processed is to output multiple records (lines) to destinations (e.g. multiple docs to OpenSearch index). Simpliar to the tool jq
, I can provide a json path like .Records
to get child fields only.
- Case 2: Support ndjson
For example, the raw file is not really a valid json file, however, each line is a valid json file.
{"key1": "value1", ....}
{"key1": "value2", ....}
...
The expected result after processed is to output each records (lines) to destinations (e.g. multiple docs to OpenSearch index).
I can work on this, and would like to request feedback on another feature for the parse_json
processor here.
It may be useful to support using a JSON pointer to select the part of the JSON string that will be parsed. A user could add a pointer
option to their parse_json
configuration containing a JSON pointer if they wish to process only the part of the JSON string that the pointer
selects.
This setting would be optional, and if the pointer
is not specified or invalid then the entire source
will be processed. When source: my_field
and pointer: /key2/key2child
, the example Event:
"my_field" : "{\"key1\" : \"value1\", \"key2\" : { \"key2child\" : \"innerValue\" }}"
Is processed into
"key2child": "innerValue"
.
If the inner key conflicts with another field on the Event, the absolute path of the inner key will be placed in the destination
field (for this example, it's: key2/key2child
).
Alternatively, the JSON pointer could be specified in the source (like source: message/key2/key2child
). However, I think that decoupling this feature from the source
option is a less confusing user experience. The JSON data is not related to what the source
field is named since many sources have Event fields with message
on them, and having to respecify the JSON pointer if the name of the source
field changes is confusing. Elsewhere in Data Prepper the source
field is used only to direct the processor to the field to process, so I suggest that parse_json
follows this convention and has a separate optional configuration option pointer
to parse based on a JSON pointer.