janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

textcontains with label constrict doesn't work

Open ChenZhaobin opened this issue 4 years ago • 13 comments

when I use g.V().has('name', textContains('a')) or g.V().hasLabel('company'),they all works well ,but if I combine these two conditions, I always get no results. for example g.V().hasLabel('company').has('name', textContains('a')) . although all my vetex labels are company, I got zero count in result. Could someone helps me out please ?Since it's really very important for me

ChenZhaobin avatar Aug 25 '20 07:08 ChenZhaobin

Are you using partitioned labels? In that case this looks like a duplicate of #1842 where you also already commented.

FlorianHockmann avatar Aug 25 '20 08:08 FlorianHockmann

Actually ,I am using the configuration docker-compose-cql-es.yml same with here: https://github.com/JanusGraph/janusgraph-docker

ChenZhaobin avatar Aug 25 '20 08:08 ChenZhaobin

My question was whether you have created the vertex label company like this:

mgmt.makeVertexLabel('company').partition().make()

as #1842 mentions a problem with vertex label constraints if the label is partitioned.

Apart from that, is the problem specific to text predicates or does it also occur if you use a simple has step? So something like this:

g.V().hasLabel('company', 'name', 'test-company')

FlorianHockmann avatar Aug 25 '20 15:08 FlorianHockmann

@FlorianHockmann I have created the vertex label company using gremlin.net like this: var company= g.AddV("company").Property("name", 'IBM').Property("code", "010101").Next() and it is normal when I combine hasLabel and has filter, and it is also normal if I use containsTextPrefix('I') ,but when comes to textContains,it just doesn't work. textContains('I') doesn't work, textContains('IB') doesn't work, unless textContains('IBM') can find one result with name IBM,but actually there are companies naming 'IBMan','IBManufacture' in my database.

ChenZhaobin avatar Aug 26 '20 01:08 ChenZhaobin

Hi @ChenZhaobin That's intended behavior. Have a look at the docs. It says:

textContains: is true if (at least) one word inside the text string matches the query string

So keep in mind textContains matches full words, not arbitrary substrings. That's why 'IBM' is found but 'IBMan' is not found. If you had an entry like 'IBM Manufacture', it would be found.

rngcntr avatar Aug 28 '20 12:08 rngcntr

@rngcntr nope,maybe above is not a good example, actually my field is composed of multiple chinese characters,whose every word can be analysed to a string using ik analyzer,which is used as a plugin in elasticsearch. it is normal when using g.V().has('name', textContains('one or more chinese character')) ; but the list result is null when using g.V().hasLabel('company').has('name', textContains('one or more chinese character')) which combines indexed field and the label filter

ChenZhaobin avatar Aug 31 '20 08:08 ChenZhaobin

@rngcntr @FlorianHockmann finally,I solved this issue by below query: g.V().hasLabel('company').filter{it.get().property('name').value().contains('one or more chinese character')}

ChenZhaobin avatar Aug 31 '20 09:08 ChenZhaobin

Nice to see your solution @ChenZhaobin! But I think the issue should stay open because the use of hasLabel should not impact the functionality of textContains.

rngcntr avatar Aug 31 '20 10:08 rngcntr

@rngcntr reopened it, guess this is an issue related with mix index using es and other than default analyzer

ChenZhaobin avatar Aug 31 '20 10:08 ChenZhaobin

this issue has nothing to do with custom analyzer, it is same as below tickets: https://github.com/JanusGraph/janusgraph/issues/1788 https://github.com/JanusGraph/janusgraph/issues/1379

@FlorianHockmann @porunov @pluradj do we have solution or plan for this?

ChenZhaobin avatar Oct 10 '20 06:10 ChenZhaobin

The problem ist that textContains does not (as the name implies) searches for a substring, instead it searches for a word! What does that mean? Well the value gets tokenized and then the value will be searched for the searchterm with space in front and after it. Completly bad documented. And the worst: There is no alternative to search for a substring

Zonkodonko avatar Apr 06 '23 09:04 Zonkodonko

@Zonkodonko: Yes, textContains searches for words. That was however already described above.

How is that poorly documented? The docs state first that:

Text search predicates which match against the individual words inside a text string after it has been tokenized

and then for textContains:

is true if (at least) one word inside the text string matches the query string

(emphasize added)

If you think that the docs could be improved on this, then please open a new issue.

This issue is about the problem that:

the use of hasLabel should not impact the functionality of textContains.

which is also described in the issue description itself.

FlorianHockmann avatar Apr 06 '23 14:04 FlorianHockmann

@FlorianHockmann You are right. Somehow I didn't catch this while reading the documentation. It's just that the wording of the methode and of the documentation is really confusing. Especially if you read the gremlin documentation before and think you know how it is supposed to work. Sorry for that.

Zonkodonko avatar Apr 06 '23 15:04 Zonkodonko