ActiveData
ActiveData copied to clipboard
Provide high speed filtering and aggregation over data
By using `adr inspect --table repo`, I see that there exists a `repo.changeset.backedoutby` key. However, it doesn't exist for changesets that were backed out. For example, take this changeset: https://hg.mozilla.org/integration/autoland/rev/91bc05242c6f02a5d30c71557693129303e7067e...
ip-172-31-1-12 (pid 16298) - 2020-01-14 14:38:34.727486 - Unknown Thread 140089220822848 - "/home/ec2-user/ActiveData/vendor/jx_elasticsearch/meta.py:1036" - (__init__) - WARNING: Problem getting query path "task.artifacts" in snowflake "task" File "/home/ec2-user/ActiveData/vendor/jx_elasticsearch/meta.py", line 1036, in __init__...
The current behaviour is to return 10 values if there is no `limit`. This keeps the simplest query simple (`{"from":table}`), but it can be confusing; despite the round number 10....
``` "edges":[{ "allowNulls":false, "domain":{ "interval":"week", "max":"today", "min":"today-1year", "type":"time", "sort":-1 }, "name":"date", "value":"action.end_time" }], ```
The query interpreter happily throws an error when it can not work with the query handed to it. This make for some ugly errors: ``` Call to ActiveData failed File...
My latest theory is the network baseline transfer limits on the spot nodes is too low: Ingestion and queries slow to a crawl while file transfers consume all available network....
When warnings are raised while processing query, add them to the meta. Specifically, if a column does not exist (or has cardinality==0) and it results in zero documents matching, like......
Example that works: ``` { "from":"task.task.tags", "limit":1000, "select":{ "aggregate":"union", "name":"group_id", "value":"task.group.id" }, "where":[ {"eq":{"name":"description"}}, {"find":{"task.tags.value":"Mozilla-LDAP|"}}, {"gte":{"action.start_time":{"date":"1-jul-2019"}}}, {"eq":{"treeherder.symbol":"rt"}} ] } ``` replacing `select` with a `groupby` clause will show less values....
the nested filter is not inserted into the ES query: ``` { "from":"treeherder.job_log.failure_line", "select":["expected","status","test"], "where":[ {"eq":{"failure.classification":"intermittent"}}, {"gte":{"action.start_time":{"date":"today-2week"}}}, {"ne":{"run.result":"success"}}, {"eq":{"action":"test_result"}}, {"exists":"test"} ], "limit":10 } ```