Slow parsing
I have a JSON file contains 138 instances records, which grabbed from EC2 describe-instances
❯ jq 'length' ../tmp/2.json
138
It's slightly different from the example, just flatten with accountId:
{
"accountId": "1234",
"instance": {
"AmiLaunchIndex": 0,
"ImageId": "ami-0abcdef1234567890",
...,
"Tags": [
{
"Key": "domain_name",
"Value": "foo.bar.com"
},
{
"Key": "git_info",
"Value": "V2.8.7.01-123-1111111"
},
{
"Key": "RebootSetting",
"Value": "[{\"Zone\": \"NZ\", \"Default\": {\"MF\": \"7-22\", \"SS\": \"0-0\"}}]"
},
{
"Key": "os_version",
"Value": "20.04"
},
{
"Key": "region",
"Value": "nz"
},
{
"Key": "customer",
"Value": "bar_group"
},
{
"Key": "environment",
"Value": "non-production"
},
{
"Key": "rds",
"Value": "xyz.rds.amazonaws.com"
},
{
"Key": "Name",
"Value": "FOO-BAR"
},
{
"Key": "aws_account_name",
"Value": "FOO-NonProd"
},
{
"Key": "AutoShutdown",
"Value": "True"
},
{
"Key": "AutoStart",
"Value": "True"
},
{
"Key": "application_version",
"Value": "2.8.7"
},
{
"Key": "Create_Auto_Alarms",
"Value": "2022-04-26 02:46:11.030953"
},
{
"Key": "usage",
"Value": "insurance"
}
],
...
}
}
There is my original jq expression, it's to filter some tags, and convert the tags from { Key: string, Value: string }[] to objects with camelCase keys:
[
.[]
| select(.instance.Tags != null)
| . as $instance
| .instance | ({
"accountId": $instance.accountId,
"imageId": .ImageId,
"instanceId": .InstanceId,
"instanceType": .InstanceType,
"keyName": .KeyName,
"state": .State.Name,
"tags": (.Tags
| map({
key: (.Key | gsub("_(?<a>[a-z])"; .a|ascii_upcase) | (.[0:1] | ascii_downcase) + .[1:]),
value: .Value
})
| sort_by(.key)
| from_entries
)
})
]
However, it will take 20s to run and result in an error:

So I simplified tags to "tags": (.Tags | map({ (.Key): .Value }) | add), the result is still very slow:

I also have a suggestion, since jaq is a clone of jq. With jq I can do
jq -f ./my_filter.jq ./data.json
With jaq I have to
jaq $(cat ./my_filter.jq) < ./data.json
And the jq filter file can not contain any special chars like new-line, as it's part of the command line argument. It's so inconvenient to use.
Oh, that sounds like a deficiency in the parser!
I will unfortunately not get around to diagnose it right now, as I'm going on holiday for several weeks tomorrow.
On the topic of -f, I do already have plans to support it.
I just implemented -f. Thanks for your suggestion.
I was also able reproduce your performance issue; parsing your filter (and giving the error about gsub) takes 27 seconds here.
It might take me some time to correct this, though ...
I have now implemented faster precedence parsing in 012c9c5, which makes your example (fail to) parse after only 0.009 seconds (instead of 27 seconds before).