jaq icon indicating copy to clipboard operation
jaq copied to clipboard

Slow parsing

Open jameschenjav opened this issue 3 years ago • 2 comments

I have a JSON file contains 138 instances records, which grabbed from EC2 describe-instances

❯ jq 'length' ../tmp/2.json
138

It's slightly different from the example, just flatten with accountId:

{
  "accountId": "1234",
  "instance": {
    "AmiLaunchIndex": 0,
    "ImageId": "ami-0abcdef1234567890",
    ...,
    "Tags": [
      {
        "Key": "domain_name",
        "Value": "foo.bar.com"
      },
      {
        "Key": "git_info",
        "Value": "V2.8.7.01-123-1111111"
      },
      {
        "Key": "RebootSetting",
        "Value": "[{\"Zone\": \"NZ\", \"Default\": {\"MF\": \"7-22\", \"SS\": \"0-0\"}}]"
      },
      {
        "Key": "os_version",
        "Value": "20.04"
      },
      {
        "Key": "region",
        "Value": "nz"
      },
      {
        "Key": "customer",
        "Value": "bar_group"
      },
      {
        "Key": "environment",
        "Value": "non-production"
      },
      {
        "Key": "rds",
        "Value": "xyz.rds.amazonaws.com"
      },
      {
        "Key": "Name",
        "Value": "FOO-BAR"
      },
      {
        "Key": "aws_account_name",
        "Value": "FOO-NonProd"
      },
      {
        "Key": "AutoShutdown",
        "Value": "True"
      },
      {
        "Key": "AutoStart",
        "Value": "True"
      },
      {
        "Key": "application_version",
        "Value": "2.8.7"
      },
      {
        "Key": "Create_Auto_Alarms",
        "Value": "2022-04-26 02:46:11.030953"
      },
      {
        "Key": "usage",
        "Value": "insurance"
      }
    ],
    ...
  }
}

There is my original jq expression, it's to filter some tags, and convert the tags from { Key: string, Value: string }[] to objects with camelCase keys:

[
  .[]
    | select(.instance.Tags != null)
    | . as $instance
    | .instance | ({
      "accountId": $instance.accountId,
      "imageId": .ImageId,
      "instanceId": .InstanceId,
      "instanceType": .InstanceType,
      "keyName": .KeyName,
      "state": .State.Name,
      "tags": (.Tags
        | map({
            key: (.Key | gsub("_(?<a>[a-z])"; .a|ascii_upcase) | (.[0:1] | ascii_downcase) + .[1:]),
            value: .Value
          })
        | sort_by(.key)
        | from_entries
      )
    })
]

However, it will take 20s to run and result in an error: image

So I simplified tags to "tags": (.Tags | map({ (.Key): .Value }) | add), the result is still very slow: image image

I also have a suggestion, since jaq is a clone of jq. With jq I can do

jq -f ./my_filter.jq ./data.json

With jaq I have to

jaq $(cat ./my_filter.jq) < ./data.json

And the jq filter file can not contain any special chars like new-line, as it's part of the command line argument. It's so inconvenient to use.

jameschenjav avatar Apr 30 '22 06:04 jameschenjav

Oh, that sounds like a deficiency in the parser! I will unfortunately not get around to diagnose it right now, as I'm going on holiday for several weeks tomorrow. On the topic of -f, I do already have plans to support it.

01mf02 avatar May 01 '22 08:05 01mf02

I just implemented -f. Thanks for your suggestion. I was also able reproduce your performance issue; parsing your filter (and giving the error about gsub) takes 27 seconds here. It might take me some time to correct this, though ...

01mf02 avatar May 19 '22 10:05 01mf02

I have now implemented faster precedence parsing in 012c9c5, which makes your example (fail to) parse after only 0.009 seconds (instead of 27 seconds before).

01mf02 avatar Sep 26 '22 09:09 01mf02