druid icon indicating copy to clipboard operation
druid copied to clipboard

Parsing Json data - Flattening columns generates 'null' if column contains '(B)'

Open Rachmaninoffff opened this issue 3 years ago • 2 comments

Affected Version

0.22.1

Description

Please include as much detailed information about the problem as possible.

  • Cluster size: 3 , one master ,one query, one data
  • Configurations in use: no change

Flattening columns generates 'null' if column contains '(B)' i guess because of '()' image

Rachmaninoffff avatar Jun 28 '22 10:06 Rachmaninoffff

Original row: { "severity_label": "1", "severity": "9", "geoip": { "country_(B)name": "xxzhongguo", "city_name": "xxhaerbin", "owner_domain": "null", "region_name": "xxheilongjian", "isp_domain": { "Chengdu": "Liangjie", "beijing": { "YJZ": "Liangjie", "shuzu": [ "1", "2", "dasd" ] } }, "ip": "221.207.218.153", "time__s__": "20220622 07:26:15" }, "huanqiu": { "country_name": "xxzhongguo", "city_name": "xxhaerbin", "owner_domain": "null", "region_name": "xxheilongjian", "isp_domain": { "Chengdu": "Liangjie", "beijing": "tianjing" }, "ip": "221.207.218.153", "time__s__": "20220622 07:26:15" }, "vpn_name": " SSLvpn YGBX", "facility_label": "kernel", "EventID": "45140632", "sys_type": "190", "result": "passed", "user_ip": "10.222.85.103", "priority": "10", "UserCode": "BA5202999999", "@timestamp": "2022-06-28T17:32:52.873080Z", "module": "VPN", "vpn_ip": "10.249.192.84", "os_user": "root", "type": "network", "time": "Dec 11 08:33:29", "OS": "Windows 7 Ultimate - 7601.win7sp1 ldr escrow.1903051700", "client_version": "1.4.9.1274", "@version": "1", "SPI": "Ob5cf596", "MAC": "44:37:E6:32:55:60", "host": "10.10.220.25", "client_ip": "221.207.218.153", "tags": [ "~rokparsefailure sysloginput" ], "user": "BA52020009255", "facility": "11" }

Rachmaninoffff avatar Jun 28 '22 10:06 Rachmaninoffff

You're right, this is due to "()" in the name of a json node.

And Because this is caused by JsonPath which is used Druid to flatten json object, I don't think this problem can be addressed in short time. Before that, you have to do some ETL to process such "invalid" characters.

FrankChen021 avatar Jul 09 '22 03:07 FrankChen021