openobserve Limit the depth of parsing json

Which OpenObserve functionalities are relevant/related to the feature request?

log search

Description

Thank you for developing such a great product. Thank you Coldstar for your patience and help.

There are many structures in our log content. We use zap to print the log, and the entire log will be output as json. These structures experience a lot, and different structures have the same fields, and the depth is generally 3 to 4 layers. The data example is as follows:

./gw.log.20230712:21080:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997204","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21082:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997204","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21105:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21107:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21124:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21126:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21145:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997329","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21147:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997329","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}

When openobserve is processing, it will unpack all the json and generate a lot of indexes, and the whole search speed is very slow.

Proposed solution

It is recommended to have a parameter to limit the depth of parsing json. Either parse all, or only the first level.

Alternatives considered

For the time being, we can only serialize the log first, so zinc will recognize it as a string instead of json

Jul 12 '23 04:07 colorado666

sample data:

{
  "station_sn":"0061997204","category":"48H","fn":52,
  "fn34":{
    "rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
    "rain_sample_timestamp":1689127200,
    "rain_accumu_timestamp":1689127200,
    "water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
    "water_sample_timestamp":1689127200,
    "volage_sample":11.96,
    "volage_sample_timestamp":1689127200
  }
}

Jul 12 '23 12:07 prabhatsharma

Can be handled using - https://github.com/openobserve/openobserve/issues/1081

Jul 12 '23 12:07 prabhatsharma

We already made some changes for this issue, now we only flatten object, and skip array. the result like this:

Oraginal data:

{
  "station_sn":"0061997204","category":"48H","fn":52,
  "fn34":{
    "rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
    "rain_sample_timestamp":1689127200,
    "rain_accumu_timestamp":1689127200,
    "water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
    "water_sample_timestamp":1689127200,
    "volage_sample":11.96,
    "volage_sample_timestamp":1689127200
  }
}

Ingested data in OpenObserve:

{
    "_timestamp": 1691129227148152,
    "category": "48H",
    "fn": 52,
    "fn34_rain_accumu_timestamp": 1689127200,
    "fn34_rain_sample": "[0,0,0,0,0,0,0,0,0,0,0,0]",
    "fn34_rain_sample_timestamp": 1689127200,
    "fn34_volage_sample": 11.96,
    "fn34_volage_sample_timestamp": 1689127200,
    "fn34_water_sample": "[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35]",
    "fn34_water_sample_timestamp": 1689127200,
    "station_sn": "0061997204"
}

The detail #1323

Aug 04 '23 06:08 hengfeiyang

Wow such a great new @hengfeiyang ! 🥳

Can we use DataFusion SQL Scalar function on these new array ? Like array_has ?

Aug 12 '23 21:08 IOITI

@IOITI in the backend, the array will store as a string. for example:

record:

{"field1":"I am a message", "field2": [1,2,3,4]}

in storage, will have two fields:

field1: i am a message
field2: "[1,2,3,4]", just a string value. can't be search as a array, but can search as a string.

Aug 13 '23 03:08 hengfeiyang

openobserve openobserve copied to clipboard

Limit the depth of parsing json

Which OpenObserve functionalities are relevant/related to the feature request?

Description

Proposed solution

Alternatives considered

openobserve
openobserve copied to clipboard