groq-js
groq-js copied to clipboard
filtering performance question
More of a performance question than a bug report, but I don't see a more suitable location.
I get that groq has capabilities that JMESPath does not, but as they do share some similarities I'm comparing them.
I need to filter some JSONs. For this test I'm working with 10k entries. Here is an example that shows 2 records. The real data has a few more columns, some with a bit of nested complexity, but my queries don't touch any of that.
[{
"base__uid": 1664200,
"base__chrom": "chr6",
"base__pos": 1312763,
"base__ref_base": "T"
},{
"base__uid": 1669279,
"base__chrom": "chr6",
"base__pos": 4116028,
"base__ref_base": "G"
}]
In JMESPath this query:
[?base__chrom=='chr6']|[?base__pos>=`1000000`]|[?base__pos<=`5000000`]
takes 7ms
and it seems to be equivalent to this groq query
*[base__chrom=='chr6' && base__pos>= 1000000 && base__pos <= 5000000]
which takes 5662ms.
So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input. At 100k records, it always takes ~30s. 10k records = ~3s. etc.
I'm filtering with this code
value = await evaluate(groqtree, { dataset: allrec });
accepted = await value.get();
Do my performance numbers seem appropriate? Anything I should dig into in hopes of getting better performance from groq?
So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input.
This is expected. groq-js is a naive implementation and doesn't index the documents in any way to speed up query performance. Right now there's no performance gain of using groq-js vs. just calling .filter(…)
in JavaScript.
which takes 5662ms.
As for the performance itself: There hasn't really been done any specific work in making it efficient. There might be a lot of low-hanging fruits which can speed up performance.
I'm running into similar performance issues. I'm attempting to make a reusable groq testing library, and for some queries across 1500 entries, it's taking 15+ seconds. These queries are perfectly fast in production.
I assume that the lack of indexing would essentially prevent this from running any faster ... do you think there's any possibility that groq-js
could add support for indexing?