arcadedb icon indicating copy to clipboard operation
arcadedb copied to clipboard

Error with containstext against an indexed property

Open marco-brandizi opened this issue 1 year ago • 10 comments

ArcadeDB Version: 23.3.1

JDK Version: OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode)

OS: macos 13.3 (22E252)

Expected behavior

(I've done all of the operations described in the follow from the studio web app)

I'm new to ArcadeDB and so far I've managed to import a graph database from a .graphml file. Then, while playing with indexes, I tried this:

# These two went fine
create property `Person:Resource`.`label` string
create index on `Person:Resource` ( `label` ) full_text

# This failed
select * from `Person:Resource` where `label` containstext 'John'

the select fails with the, while I was expecting some result.

Actual behavior

It fails with the following:

<ArcadeDB_0> Error on command execution (PostCommandHandler)
java.lang.UnsupportedOperationException: Cannot execute index query with `label` CONTAINSTEXT 'John'
	at com.arcadedb.query.sql.parser.BooleanExpression.resolveKeyFrom(BooleanExpression.java:203)
	at com.arcadedb.query.sql.executor.FetchFromIndexStep.indexKeyFrom(FetchFromIndexStep.java:537)
	at com.arcadedb.query.sql.executor.FetchFromIndexStep.processAndBlock(FetchFromIndexStep.java:280)
	at com.arcadedb.query.sql.executor.FetchFromIndexStep.init(FetchFromIndexStep.java:208)
	at com.arcadedb.query.sql.executor.FetchFromIndexStep.init(FetchFromIndexStep.java:192)
	at com.arcadedb.query.sql.executor.FetchFromIndexStep.syncPull(FetchFromIndexStep.java:85)
	at com.arcadedb.query.sql.executor.GetValueFromIndexEntryStep$1.fetchNextItem(GetValueFromIndexEntryStep.java:101)
	at com.arcadedb.query.sql.executor.GetValueFromIndexEntryStep$1.hasNext(GetValueFromIndexEntryStep.java:71)
	at com.arcadedb.query.sql.executor.DistinctExecutionStep.fetchNext(DistinctExecutionStep.java:93)
	at com.arcadedb.query.sql.executor.DistinctExecutionStep$1.hasNext(DistinctExecutionStep.java:61)
	at com.arcadedb.query.sql.executor.FilterByClassStep$1.fetchNextItem(FilterByClassStep.java:59)
	at com.arcadedb.query.sql.executor.FilterByClassStep$1.hasNext(FilterByClassStep.java:97)
	at com.arcadedb.query.sql.executor.ProjectionCalculationStep$1.hasNext(ProjectionCalculationStep.java:43)
	at com.arcadedb.query.sql.parser.LocalResultSet.fetchNext(LocalResultSet.java:46)
	at com.arcadedb.query.sql.parser.LocalResultSet.<init>(LocalResultSet.java:39)
	at com.arcadedb.query.sql.parser.SelectStatement.execute(SelectStatement.java:207)
	at com.arcadedb.query.sql.parser.Statement.execute(Statement.java:85)
	at com.arcadedb.query.sql.parser.Statement.execute(Statement.java:69)
	at com.arcadedb.query.sql.SQLQueryEngine.command(SQLQueryEngine.java:97)
	at com.arcadedb.database.EmbeddedDatabase.command(EmbeddedDatabase.java:1264)
	at com.arcadedb.server.ServerDatabase.command(ServerDatabase.java:421)
	at com.arcadedb.server.http.handler.PostCommandHandler.executeCommand(PostCommandHandler.java:127)
	at com.arcadedb.server.http.handler.PostCommandHandler.execute(PostCommandHandler.java:88)
	at com.arcadedb.server.http.handler.DatabaseAbstractHandler.execute(DatabaseAbstractHandler.java:92)
	at com.arcadedb.server.http.handler.AbstractHandler.handleRequest(AbstractHandler.java:127)
	at io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
	at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:859)
	at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
	at org.xnio.XnioWorker$WorkerThreadFactory$1$1.run(XnioWorker.java:1282)
	at java.base/java.lang.Thread.run(Thread.java:829)

Am I doing something wrong? Essentially, I'm trying to understand how to efficiently search with partial text matches. Actually, I need to do it from Gremlin, but for the moment, I'd like to make it work at least using SQL.

Steps to reproduce

See above

marco-brandizi avatar May 01 '23 22:05 marco-brandizi

Hi @marco-brandizi, welcome to ArcadeDB!

Have you tried this?

select * from `Person:Resource` where `label` like '%John%'

lvca avatar May 02 '23 09:05 lvca

Ciao @lvca, and thank you.

Sorry, I forgot to add that bit: yes, I tried LIKE 'John%' (I've used SQL for too many years, to forget it! :-) ), and it works, however, when I click on the Explain tab in the Studio, it doesn't seem to be using the index, either a string or a full text one:

+ FETCH FROM TYPE Person:Resource
  + FETCH FROM BUCKET 25 (Person%3AResource_0) ASC
  + FETCH FROM BUCKET 28 (Person%3AResource_1) ASC
  + FETCH FROM BUCKET 31 (Person%3AResource_2) ASC
  + FETCH FROM BUCKET 34 (Person%3AResource_3) ASC
  + FETCH FROM BUCKET 37 (Person%3AResource_4) ASC
  + FETCH FROM BUCKET 40 (Person%3AResource_5) ASC
  + FETCH FROM BUCKET 43 (Person%3AResource_6) ASC
  + FETCH FROM BUCKET 46 (Person%3AResource_7) ASC
+ FILTER ITEMS WHERE 
  `label` LIKE 'John%'
+ LIMIT ( LIMIT 25)
+ CALCULATE PROJECTIONS
  *

I also wonder how much and in which cases I need an index for these searches. I plan to do many equality-based searches, but not so frequent partial matches. In the latter case, it seems that it does some optimisation anyway, by looking at a limited number of buckets (like in a hash table?), rather than doing a full scan.

marco-brandizi avatar May 02 '23 09:05 marco-brandizi

@gramian I remember you played recently with those operators and full-text indexes. Any findings?

lvca avatar May 02 '23 09:05 lvca

Indeed, I had (have) the same issue with a (multi property) Full-text index. I am also using the wildcard search as workaround. I will append my issue to this one later.

gramian avatar May 02 '23 10:05 gramian

My problem was:

I was starting off this example: https://orientdb.com/docs/3.2.x/indexing/Full-Text-Index.html#two-minutes-tutorial but when using: SELECT FROM Item[text] WHERE text CONTAINSTEXT "sister" I get the error Cannot execute index query with text CONTAINSTEXT "sister".

But I don't know if it is related...

gramian avatar May 02 '23 20:05 gramian

I've double-checked the OrientDB codebase and there is no one test with CONTAINSTEXT... I guess this was in the old SQL parser and never ported. So it's missing since 2015 I guess...

The full text index is never used from LIKE, ILIKE and CONTAINSTEXT. I have to write it from scratch. Starting from LIKE, I could split the text with the character % and search for all the words (If no % is present, it's an exact match and the full-text index can't be used because it splits the text in words).

Example:

SELECT FROM Book WHERE abstract LIKE "%magic%"
  1. If a FULL-TEXT index is defined on Book.abstract field, then a lookup of "magic" is executed
  2. If the index is not defined, a full scan is executed looking for "magic" in the abstract text

With CONTAINSTEXT it's easier, just look for a certain keyword. We could take this chance to extend the CONTAINSTEXT operator to allow an array of strings (the text must contains all the keywords - in AND). Example:

SELECT FROM Book WHERE abstract CONTAINSTEXT ["magic","wand"]

lvca avatar May 06 '23 15:05 lvca

@marco-brandizi a temporary workaround is to use the EQUALS operator. Example:

select * from `Person:Resource` where `label` = 'John'

Should work. The equals uses the index and if it's FULL text, a full text search is executed. While I think this is not semantically intuitive/correct, waiting for the CONTAINSTEXT fix it would work.

lvca avatar May 06 '23 15:05 lvca

Thanks. So, you mean '=' finds "John Doe" too when there is a full text index?

marco-brandizi avatar May 06 '23 16:05 marco-brandizi

@marco-brandizi correct.

lvca avatar May 06 '23 22:05 lvca

I had a problem related to this where I was seeing the = operator return results when the value didn't exactly match which is explained by the above as I had a full text index on the property in question.

I found by swapping out = for <=> I could get my desired result because that seems to ignore the full text index.

alan-strickland-red avatar Jun 12 '23 09:06 alan-strickland-red