age icon indicating copy to clipboard operation
age copied to clipboard

Odd behavior in context of orderability of different agtypes.

Open CapnSpek opened this issue 1 year ago • 12 comments

Describe the bug While evaluating the inequality between an edge and a path, it appears that the orderability is totally bypassed and true/false is returned based on the sign of the inequality.

How are you accessing AGE (Command line, driver, etc.)? Command line

What data setup do we need to do? Apache AGE, and any graph with atleast 2 vertices and 1 edge.

SELECT * from cypher('my_graph_name', $$
  CREATE (:Person {name: 'John'})-[:Knows]->(:Person {name: 'Jonh'})
$$) as (a agtype);

What is the necessary configuration info needed? Any graph with atleast 2 vertices and 1 edge.

What is the command that caused the error?

In regards to orderability between different agtypes: - https://age.apache.org/age-manual/master/intro/comparability.html

I was testing the orderability between path and an edge, and I found this odd behavior: -

Comparing a path with an edge with p > e inequality: -

test=# SELECT *
FROM cypher('test', $$
        WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
            {id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
           {id: 1, label: "label_name_2", properties: {}}::vertex
           ]::path as p
MATCH (n)-[e]-(m) RETURN p>e
$$) AS (e agtype);
  e   
------
 true
 true
 true
 true
(4 rows)

Fair enough, same result holds when we compare it with p < e inequality

test=# SELECT *
FROM cypher('test', $$
        WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
            {id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
           {id: 1, label: "label_name_2", properties: {}}::vertex
           ]::path as p
MATCH (n)-[e]-(m) RETURN p<e
$$) AS (e agtype);
   e   
-------
 false
 false
 false
 false
(4 rows)

But look, when we change the inequality to e > p

test=# SELECT *
FROM cypher('test', $$
        WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
            {id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
           {id: 1, label: "label_name_2", properties: {}}::vertex
           ]::path as p
MATCH (n)-[e]-(m) RETURN e>p
$$) AS (e agtype);
  e   
------
 true
 true
 true
 true
(4 rows)

We get true again, which is in direct contradiction to p > e inequality

And when we check for e < p, we get false again

test=# SELECT *
FROM cypher('test', $$
        WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
            {id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
           {id: 1, label: "label_name_2", properties: {}}::vertex
           ]::path as p
MATCH (n)-[e]-(m) RETURN e<p
$$) AS (e agtype);
   e   
-------
 false
 false
 false
 false
(4 rows)

Again, in direct contradiction to the first 2 results.

Expected behavior Would have expected p>e to be true under all scenarios.

Environment (please complete the following information):

  • Version: [age 1.3.0]

CapnSpek avatar Apr 28 '23 18:04 CapnSpek

The bug can be easily fixed by modifying the "static int get_type_sort_priority(enum agtype_value_type type)" function in agtype_util.c file. The reason for the bug is that there is no priority assigned for neither edge, nor path. The function returns -1 for both.

And because the value is checked like this

res = (get_type_sort_priority(va.type) <
                   get_type_sort_priority(vb.type)) ?
                      -1 :
                      1;

It means even if the function returns the same value for both the types, we get 1 (true) result for both edge > path and path > edge.

I would suggest modifying the function to declare priorities concretely.

CapnSpek avatar Apr 28 '23 22:04 CapnSpek

@CapnSpek Then it needs to be discussed as to what the priority should be for those types. I would look to Neo4j for guidance.

jrgemignani avatar Apr 28 '23 22:04 jrgemignani

It appears that neo4j has no orderability between different types. In the sense that comparing an edge to a vertex, or an edge to a path, or a path to a vertex, or anything else, returns null. image

image image

No comparability between booleans and numerics either. image

Or strings and numerics image

If we wish to implement the ways of Neo4j, then orderability can be outright removed and an error can be thrown when 2 different types (except in the case of 2 numerics) are compared.

CapnSpek avatar Apr 29 '23 11:04 CapnSpek

This behavior can be implemented by modifying function int compare_agtype_containers_orderability(agtype_container *a, agtype_container *b) in agtype_util.c Specifically near line 363 I believe.

However, it remains a choice of design. Whether we should have orderability between different types in AGE, or not.

CapnSpek avatar Apr 29 '23 11:04 CapnSpek

@CapnSpek There was a document that covered the original reasoning a ways back. Unfortunately, I'm not able to find it anymore. Basically, orderability was chosen over comparability in order to allow for the sorting, grouping, of output.

The way the function get_type_sort_priority is used should allow us to add in the others. However, we need to be aware that this may impact current regression tests and will impact future ones. As it will change the ordering, we need to think about what would be the ideal location for each of these additions, path and edge.

jrgemignani avatar May 01 '23 16:05 jrgemignani

@CapnSpek Feel free to propose your ideas for adjustments to that function here.

jrgemignani avatar May 03 '23 00:05 jrgemignani

Looking at the regression tests that relate to orderability in /regress/sql/agtype.sql line number 506 The order that that was originally intended and implemented is:- -- Object < List < String < Boolean < Integer = Float = Numeric < Null

It does not mention anything about Edge, Vertex, Path, Array, Map.

Looking at how the regression tests are written, for example: - SELECT agtype_in('[1,3,5,7,9,11]') < '"string"'; (line number 514) And the fact that in agtype_util.c array has a lower priority than string, it becomes obvious that 'List' was supposed to refer to 'Arrays' when they wrote 'List < String'

That solves the part Array < String < Boolean < Integer = Float = Numeric < Null

Leaving just object part to be specified The priority of vertex is already defined to be less than array and more than any other object in the code. (line 200 agtype_util.c)

if (type == AGTV_OBJECT)
        return 0;
    if (type == AGTV_VERTEX)
        return 1;
    if (type == AGTV_ARRAY)
        return 2;

That gives us Object < Vertex < Array < String < Boolean < Integer = Float = Numeric < Null

If Edge, Path, and Map are all considered to be "objects" and assigned a priority, that would fix the issue.

Presuming that orderability was chosen over comparability in order to allow for the sorting, grouping, of output, then we can dissect the order like this

Null - Numbers - Booleans - Strings - Arrays - Graph elements - Other objects.

In that case: - Null -> Then numeric -> Then boolean -> Then string -> Then arrays -> Then vertices (graph element) Continuing this trend, it should go like -> Edges (graph element) -> Paths (collection of graph elements) -> Objects (Maps for example)

So in the end I recommend extending the current priority order to this

Object - 0 Path - 1 Edge - 2 Vertex - 3 Array - 4 String - 5 Boolean - 6 Numbers -7 Null - 8

This does not change anything in the current priority order, or the way it was intended to be, or the spirit from which it seems to arise from. It only adds further micro-priorities among previously undefined objects.

CapnSpek avatar May 12 '23 11:05 CapnSpek

AGE with the updated priorities does pass all existing regression tests.

image

CapnSpek avatar May 12 '23 11:05 CapnSpek

@CapnSpek I will ask others to review this as well.

jrgemignani avatar May 17 '23 17:05 jrgemignani

Thanks for catching this inconsistency @CapnSpek. I think that what you proposed is quite a clean solution for this issue, without just scrapping orderability.

To add to the discussion on whether or not orderability should exist or not, I'd like to point out that even in PG, comparing different types throws an error. This could be something we consider in this overarching discussion how to handle this.

Cheers~

dehowef avatar May 17 '23 18:05 dehowef

Thank you for your thoughts, maybe we can hold a vote regarding which direction should we take this issue in? With open feedback so everyone can express their thoughts about it.

CapnSpek avatar May 31 '23 18:05 CapnSpek

Orderability is needed for sorting and for grouping. At least, that was the reason way, way, back.

@CapnSpek I think you can go ahead, assuming no objections, with creating a PR for your change. Others will have an opportunity to voice their opinions there as well.

My 2 cents.

jrgemignani avatar Jun 06 '23 17:06 jrgemignani