groq-js icon indicating copy to clipboard operation
groq-js copied to clipboard

filtering performance question

Open cariaso opened this issue 4 years ago • 2 comments

More of a performance question than a bug report, but I don't see a more suitable location.

I get that groq has capabilities that JMESPath does not, but as they do share some similarities I'm comparing them.

I need to filter some JSONs. For this test I'm working with 10k entries. Here is an example that shows 2 records. The real data has a few more columns, some with a bit of nested complexity, but my queries don't touch any of that.

[{
  "base__uid": 1664200,
  "base__chrom": "chr6",
  "base__pos": 1312763,
  "base__ref_base": "T"
},{
  "base__uid": 1669279,
  "base__chrom": "chr6",
  "base__pos": 4116028,
  "base__ref_base": "G"
}]

In JMESPath this query:

[?base__chrom=='chr6']|[?base__pos>=`1000000`]|[?base__pos<=`5000000`]

takes 7ms

and it seems to be equivalent to this groq query

*[base__chrom=='chr6' && base__pos>= 1000000 && base__pos <= 5000000]

which takes 5662ms.

So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input. At 100k records, it always takes ~30s. 10k records = ~3s. etc.

I'm filtering with this code

       value = await evaluate(groqtree, {  dataset: allrec  });
       accepted = await value.get();

Do my performance numbers seem appropriate? Anything I should dig into in hopes of getting better performance from groq?

cariaso avatar Dec 13 '20 08:12 cariaso

So far my queries aren't particularly complex, but regardless of their complexity, runtime for groq seems linear and dominated by the number of records in my input.

This is expected. groq-js is a naive implementation and doesn't index the documents in any way to speed up query performance. Right now there's no performance gain of using groq-js vs. just calling .filter(…) in JavaScript.

which takes 5662ms.

As for the performance itself: There hasn't really been done any specific work in making it efficient. There might be a lot of low-hanging fruits which can speed up performance.

judofyr avatar Dec 16 '20 12:12 judofyr

I'm running into similar performance issues. I'm attempting to make a reusable groq testing library, and for some queries across 1500 entries, it's taking 15+ seconds. These queries are perfectly fast in production. I assume that the lack of indexing would essentially prevent this from running any faster ... do you think there's any possibility that groq-js could add support for indexing?

scottrippey avatar Dec 25 '22 20:12 scottrippey