nebula icon indicating copy to clipboard operation
nebula copied to clipboard

feat: fn is_inversed, equivalent to `typeid(e) < 0`

Open wey-gu opened this issue 10 months ago • 11 comments

this is handy to check if an edge is scanned from its dst end or not.

What type of PR is this?

  • [ ] bug
  • [x] feature
  • [ ] enhancement

What problem(s) does this PR solve?

Issue(s) number:

Description:

The same edge could be treated as two because we persist one instance of edge in two places, see:

(root@nebula) [basketballplayer]> 
MATCH (v1:player{name:"Tim Duncan"})-[e]-(v2:player{name:"Tony Parker"}) 
RETURN e
+----------------------------------------------------+
| e                                                  |
+----------------------------------------------------+
| [:follow "player100"->"player101" @0 {degree: 95}] |
| [:follow "player101"->"player100" @0 {degree: 95}] |
+----------------------------------------------------+
Got 2 rows (time spent 5.819ms/105.497167ms)

With is_inversed, we could detect those edges that are not expected in many cases.

How do you solve it?

Just yet another function returns bool: typeid(e) < 0.

Special notes for your reviewer, ex. impact of this fix, design document, etc:

Document is needed.

Checklist:

Tests:

  • [x] Unit test(positive and negative cases)
  • [x] Function test
  • [ ] Performance test
  • [ ] N/A

Affects:

  • [x] Documentation affected (Please add the label if documentation needs to be modified.)
  • [ ] Incompatibility (If it breaks the compatibility, please describe it and add the label.)
  • [ ] If it's needed to cherry-pick (If cherry-pick to some branches is required, please label the destination version(s).)
  • [ ] Performance impacted: Consumes more CPU/Memory

Release notes:

Please confirm whether to be reflected in release notes and how to describe:

ex. Fixed the bug .....

wey-gu avatar Aug 26 '23 10:08 wey-gu

@yixinglu , any chance we could bypass this lint error, please?

.linters/cpp/cpplint.py src/common/function/FunctionManager.cpp 2>&1 | grep error
src/common/function/FunctionManager.cpp:2047:  Small and focused functions are preferred: FunctionManager::FunctionManager() has 1521 non-comment lines (error triggered by exceeding 500 lines).  [readability/fn_size] [2]

wey-gu avatar Aug 26 '23 11:08 wey-gu

@yixinglu , any chance we could bypass this lint error, please?

.linters/cpp/cpplint.py src/common/function/FunctionManager.cpp 2>&1 | grep error
src/common/function/FunctionManager.cpp:2047:  Small and focused functions are preferred: FunctionManager::FunctionManager() has 1521 non-comment lines (error triggered by exceeding 500 lines).  [readability/fn_size] [2]

Maybe you could bypass it by splitting the function into some small functions, or disabling the 'readability/fn_size' rule in linters/cpp/cpplint.py

yixinglu avatar Aug 29 '23 06:08 yixinglu

The failure now seems to be related to the instability of the test cases themself.

wey-gu avatar Sep 22 '23 05:09 wey-gu

A possible approach is to use both the node id and the edge type id to determine the edge direction. The syntax might look like this:

MATCH (v1:A)-[e]-(v2:B) RETURN v1, is_source_of(v1, e) AS edge_direction, v2

czpmango avatar Sep 27 '23 03:09 czpmango

Minor suggestion: It is better to commit the refactoring-related codes to another pr, which is also more review friendly.

czpmango avatar Sep 27 '23 03:09 czpmango

IMHO, the edge type id is an implementation level concept and cannot be used to simply indicate query semantics.

A possible approach is to use both the node id and the edge type id to determine the edge direction. The syntax might look like this:

MATCH (v1:A)-[e]-(v2:B) RETURN v1, is_source_of(v1, e) AS edge_direction, v2

Yeah, agreed, mixing thing here is a little bit twisted, the way we treated e in MATCH (v1:player{name:"Tim Duncan"})-[e]-(v2:player{name:"Tony Parker"}) RETURN e as two instances, it's kind of implementation exposure twisted with graph semantics, too? So we need such functions as a mitigation on top of that implementation?

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(.

Maybe we could add both is_source_of(v_or_vid, e) and this function?

Maybe we come out with a name that's more scrupulous than is_inversed?

Minor suggestion: It is better to commit the refactoring-related codes to another pr, which is also more review-friendly.

Agreed! I'll create a separate PR for the refactor to be merged before this change!

wey-gu avatar Sep 27 '23 04:09 wey-gu

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(. Maybe we could add both is_source_of(v_or_vid, e) and this function? Maybe we come out with a name that's more scrupulous than is_inversed?

I dont think you can judge the edge direction simply by the sign of edge type id.

czpmango avatar Sep 27 '23 06:09 czpmango

is_source_of(v, e) is better and is a more scrupulous semantic definition, but sometimes we just don't have the v in the pattern to be referred :(. Maybe we could add both is_source_of(v_or_vid, e) and this function? Maybe we come out with a name that's more scrupulous than is_inversed?

I dont think you can judge the edge direction simply by the sign of edge type id.

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

wey-gu avatar Sep 27 '23 07:09 wey-gu

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

How about that(probably be better performance):

MATCH p=(s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
RETURN id(s) AS subj, pathToString(p)
std::string pathToString(Path p, std::vector<std::string> propNames={}) {
   ...
}

czpmango avatar Sep 27 '23 08:09 czpmango

This is how I may need this sugar, here

WITH map{`true`: "-[", `false`: "<-["} AS arrow_l,
     map{`true`: "]->", `false`: "]-"} AS arrow_r
MATCH (s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
  WITH id(s) AS subj, [rel in e | [
     arrow_l[tostring(typeid(rel) > 0)] +
        tostring(rel.degree)+
     arrow_r[tostring(typeid(rel) > 0)],
     CASE typeid(rel) > 0
        WHEN true THEN dst(rel)
        WHEN false THEN src(rel)
     END
     ]
  ] AS rels
  WITH
      subj,
      REDUCE(acc = collect(NULL), l in rels | acc + l) AS flattened_rels
RETURN
  subj,
  REDUCE(acc = subj,l in flattened_rels|acc + ', ' + l) AS flattened_rels

Where do I need to construct a knowledge sequence from path/edges, without getting the sign of typeid, there were some bad cases.

How about that(probably be better performance):

MATCH p=(s)-[e:follow*..2]-() WHERE id(s) IN ["player100", "player101"]
RETURN id(s) AS subj, pathToString(p)
std::string pathToString(Path p, std::vector<std::string> propNames={}) {
   ...
}

make sense!

wey-gu avatar Sep 27 '23 12:09 wey-gu

@czpmango I think I need to implement pathToString! But for now, I still consider is_invsersed helpful, it's a self-explanation for our such behavior as again hit by #5779 :-D What do you think?

wey-gu avatar Dec 11 '23 04:12 wey-gu