openobserve icon indicating copy to clipboard operation
openobserve copied to clipboard

Limit the depth of parsing json

Open colorado666 opened this issue 2 years ago • 2 comments

Which OpenObserve functionalities are relevant/related to the feature request?

log search

Description

Thank you for developing such a great product. Thank you Coldstar for your patience and help.

There are many structures in our log content. We use zap to print the log, and the entire log will be output as json. These structures experience a lot, and different structures have the same fields, and the depth is generally 3 to 4 layers. The data example is as follows:

./gw.log.20230712:21080:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997204","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21082:{"level":"debug","ts":"2023-07-12 10:00:15","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997204","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":11.96,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21105:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21107:{"level":"debug","ts":"2023-07-12 10:00:49","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0.1,0.2,0,0,0.1,0,0.1,0,0,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.82,256.04,256.71,256.34,257.85,256,256.87,257.94,257.83,256.1,256.69,257.5],"water_sample_timestamp":1689127200,"volage_sample":13.6,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21124:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0012340091","category":"48H","fn":52,"fn34":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21126:{"level":"debug","ts":"2023-07-12 10:00:58","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0012340091","data":{"rain_sample":[0,0.2,0.1,0,0,0.2,0,0,0.2,0,0.2,0.1],"rain_sample_timestamp":1689127200,"rain_accumu":128,"rain_accumu_timestamp":1689127200,"water_sample":[256.09,258.31,256.33,257.64,258.27,256.95,257.43,256.84,258.49,256.89,256.12,257.33],"water_sample_timestamp":1689127200,"volage_sample":13.09,"volage_sample_timestamp":1689127200}} ./gw.log.20230712:21145:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:24","msg":"","service":"gw","body":{"station_sn":"0061997329","category":"48H","fn":52,"fn34":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}} ./gw.log.20230712:21147:{"level":"debug","ts":"2023-07-12 10:01:44","caller":"iotmqtt/mqtt_sl651.go:226","msg":"handleFN34","service":"gw","stationSN":"0061997329","data":{"rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],"rain_sample_timestamp":1689127200,"rain_accumu_timestamp":1689127200,"water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],"water_sample_timestamp":1689127200,"volage_sample":12.12,"volage_sample_timestamp":1689127200}}

When openobserve is processing, it will unpack all the json and generate a lot of indexes, and the whole search speed is very slow.

Proposed solution

It is recommended to have a parameter to limit the depth of parsing json. Either parse all, or only the first level.

Alternatives considered

For the time being, we can only serialize the log first, so zinc will recognize it as a string instead of json

colorado666 avatar Jul 12 '23 04:07 colorado666

sample data:

{
  "station_sn":"0061997204","category":"48H","fn":52,
  "fn34":{
    "rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
    "rain_sample_timestamp":1689127200,
    "rain_accumu_timestamp":1689127200,
    "water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
    "water_sample_timestamp":1689127200,
    "volage_sample":11.96,
    "volage_sample_timestamp":1689127200
  }
}

prabhatsharma avatar Jul 12 '23 12:07 prabhatsharma

Can be handled using - https://github.com/openobserve/openobserve/issues/1081

prabhatsharma avatar Jul 12 '23 12:07 prabhatsharma

We already made some changes for this issue, now we only flatten object, and skip array. the result like this:

Oraginal data:

{
  "station_sn":"0061997204","category":"48H","fn":52,
  "fn34":{
    "rain_sample":[0,0,0,0,0,0,0,0,0,0,0,0],
    "rain_sample_timestamp":1689127200,
    "rain_accumu_timestamp":1689127200,
    "water_sample":[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35],
    "water_sample_timestamp":1689127200,
    "volage_sample":11.96,
    "volage_sample_timestamp":1689127200
  }
}

Ingested data in OpenObserve:

{
    "_timestamp": 1691129227148152,
    "category": "48H",
    "fn": 52,
    "fn34_rain_accumu_timestamp": 1689127200,
    "fn34_rain_sample": "[0,0,0,0,0,0,0,0,0,0,0,0]",
    "fn34_rain_sample_timestamp": 1689127200,
    "fn34_volage_sample": 11.96,
    "fn34_volage_sample_timestamp": 1689127200,
    "fn34_water_sample": "[655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35,655.35]",
    "fn34_water_sample_timestamp": 1689127200,
    "station_sn": "0061997204"
}

The detail #1323

hengfeiyang avatar Aug 04 '23 06:08 hengfeiyang

Wow such a great new @hengfeiyang ! 🥳

Can we use DataFusion SQL Scalar function on these new array ? Like array_has ?

IOITI avatar Aug 12 '23 21:08 IOITI

@IOITI in the backend, the array will store as a string. for example:

record:

{"field1":"I am a message", "field2": [1,2,3,4]}

in storage, will have two fields:

  • field1: i am a message
  • field2: "[1,2,3,4]", just a string value. can't be search as a array, but can search as a string.

hengfeiyang avatar Aug 13 '23 03:08 hengfeiyang