Very slow contains filter from web UI
When executing a "contains" on a String column in via the web UI, a pattern is compiled for every single comparison.
Ideally, this would instead use a io.deephaven.api.filter.FilterPattern / io.deephaven.engine.table.impl.select.WhereFilterPatternImpl.
"DeephavenApiServer-Scheduler-Concurrent-4" #46 [98] daemon prio=5 os_prio=0 cpu=216984.86ms elapsed=1072.72s tid=0x00007ad5bc001280 nid=98 runnable [0x00007ad5c5afc000]
java.lang.Thread.State: RUNNABLE
at java.util.regex.Pattern$CharPropertyGreedy.match([email protected]/Pattern.java:4470)
at java.util.regex.Matcher.match([email protected]/Matcher.java:1794)
at java.util.regex.Matcher.matches([email protected]/Matcher.java:754)
at java.util.regex.Pattern.matches([email protected]/Pattern.java:1222)
at java.lang.String.matches([email protected]/String.java:2969)
at io.deephaven.temp.c_631d8105a9581f7d820d880f95bc47c7d61844cc964870537972019e52f008bcv65_0.GeneratedFilterKernel.filter(GeneratedFilterKernel.java:91)
at io.deephaven.engine.table.impl.select.ConditionFilter$ChunkFilter.filter(ConditionFilter.java:363)
at io.deephaven.engine.table.impl.select.AbstractConditionFilter.filter(AbstractConditionFilter.java:262)
at io.deephaven.engine.table.impl.select.WhereFilter.filter(WhereFilter.java:236)
at io.deephaven.engine.table.impl.select.ConjunctiveFilter.andImpl(ConjunctiveFilter.java:56)
at io.deephaven.engine.table.impl.select.ConjunctiveFilter.filter(ConjunctiveFilter.java:66)
at io.deephaven.engine.table.impl.AbstractFilterExecution.doFilter(AbstractFilterExecution.java:132)
at io.deephaven.engine.table.impl.AbstractFilterExecution.lambda$scheduleCompletion$4(AbstractFilterExecution.java:286)
at io.deephaven.engine.table.impl.AbstractFilterExecution$$Lambda/0x00007ad66ca543e0.run(Unknown Source)
at io.deephaven.engine.table.impl.util.JobScheduler$IterationManager$TaskInvoker.execute(JobScheduler.java:258)
- locked <0x000000062d11dc88> (a io.deephaven.engine.table.impl.util.JobScheduler$IterationManager$TaskInvoker)
at io.deephaven.engine.table.impl.util.JobScheduler$IterationManager.lambda$startTasks$0(JobScheduler.java:164)
at io.deephaven.engine.table.impl.util.JobScheduler$IterationManager$$Lambda/0x00007ad66c7c1638.run(Unknown Source)
at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.lambda$submit$0(ImmediateJobScheduler.java:40)
at io.deephaven.engine.table.impl.util.ImmediateJobScheduler$$Lambda/0x00007ad66c7c1a80.run(Unknown Source)
at io.deephaven.engine.table.impl.util.ImmediateJobScheduler.submit(ImmediateJobScheduler.java:54)
at io.deephaven.engine.table.impl.util.JobScheduler$IterationManager.startTasks(JobScheduler.java:164)
at io.deephaven.engine.table.impl.util.JobScheduler.iterateSerial(JobScheduler.java:431)
at io.deephaven.engine.table.impl.AbstractFilterExecution.scheduleCompletion(AbstractFilterExecution.java:260)
at io.deephaven.engine.table.impl.QueryTable.lambda$whereInternal$27(QueryTable.java:1300)
at io.deephaven.engine.table.impl.QueryTable$$Lambda/0x00007ad66ca515f0.call(Unknown Source)
at io.deephaven.engine.table.impl.BaseTable.initializeWithSnapshot(BaseTable.java:1293)
at io.deephaven.engine.table.impl.QueryTable.lambda$whereInternal$28(QueryTable.java:1290)
at io.deephaven.engine.table.impl.QueryTable$$Lambda/0x00007ad66ca4ebe0.get(Unknown Source)
at io.deephaven.engine.table.impl.QueryTable.memoizeResult(QueryTable.java:3639)
at io.deephaven.engine.table.impl.QueryTable.lambda$whereInternal$29(QueryTable.java:1269)
at io.deephaven.engine.table.impl.QueryTable$$Lambda/0x00007ad66ca4a1e8.get(Unknown Source)
at io.deephaven.engine.table.impl.perf.QueryPerformanceRecorder.withNugget(QueryPerformanceRecorder.java:369)
at io.deephaven.engine.table.impl.QueryTable.whereInternal(QueryTable.java:1223)
at io.deephaven.engine.table.impl.QueryTable.where(QueryTable.java:1162)
at io.deephaven.engine.table.impl.QueryTable.where(QueryTable.java:100)
at io.deephaven.engine.table.impl.UncoalescedTable.where(UncoalescedTable.java:209)
at io.deephaven.engine.table.impl.UncoalescedTable.where(UncoalescedTable.java:43)
at io.deephaven.server.table.ops.FilterTableGrpcImpl.create(FilterTableGrpcImpl.java:57)
at io.deephaven.server.table.ops.FilterTableGrpcImpl.create(FilterTableGrpcImpl.java:30)
at io.deephaven.server.table.ops.TableServiceGrpcImpl$BatchExportBuilder.doExport(TableServiceGrpcImpl.java:757)
at io.deephaven.server.table.ops.TableServiceGrpcImpl$$Lambda/0x00007ad66c92c648.call(Unknown Source)
at io.deephaven.server.session.SessionState$ExportObject.doExport(SessionState.java:995)
at io.deephaven.server.session.SessionState$ExportObject$$Lambda/0x00007ad66c61ca20.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:572)
at java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:317)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run([email protected]/ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:642)
at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory.lambda$newThread$0(SchedulerModule.java:100)
at io.deephaven.server.runner.scheduler.SchedulerModule$ThreadFactory$$Lambda/0x00007ad66c498800.run(Unknown Source)
at java.lang.Thread.runWith([email protected]/Thread.java:1596)
at java.lang.Thread.run([email protected]/Thread.java:1583)
Potentially related to #3784, #3425
Part of the issue might be the "unclear" semantics of the advanced filter "contains". It looks like the web UI is creating an 'InvokeCondition' w/ the "match" method and regex (?s)(?i).*\\QAP\\E.*. 1) It's not clear to me that this should be case insensitive by default, and 2) 'ContainsCondition' would be much more appropriate and performant (regardless if case-insensitive or not).
I would argue it should be case-sensitive by default (for performance reasons), with a checkbox to turn it into a case-insensitive match.
I do want to also argue the case for #3609 (revamping the Filter grpc API), also related to #3784.
I believe this is really a JS API issue, in two ways, not something we need to address in the server or js api:
- First, the
dh.FilterValue.invoke()call shouldn't be used here anyway, it should bedh.FilterValue.matches(pattern:FilterValue)(ormatchesIgnoreCase) instead, which will be evaluated on the server as aFilterPattern, with a pre-computed pattern instance. - Second, it might be even more correct/efficient to just use
dh.FilterValue.contains()(orcontainsIgnoreCase) - today this will also evaluate to aFilterPatternon the server, but there's no specific reason that this must be true.
Thoughts @mofojed?
Note, web-client-ui ticket created / linked above. (I didn't realize at the time, but I think you can move issues from one repo to another.) Happy to close this one out.