age
age copied to clipboard
Odd behavior in context of orderability of different agtypes.
Describe the bug While evaluating the inequality between an edge and a path, it appears that the orderability is totally bypassed and true/false is returned based on the sign of the inequality.
How are you accessing AGE (Command line, driver, etc.)? Command line
What data setup do we need to do? Apache AGE, and any graph with atleast 2 vertices and 1 edge.
SELECT * from cypher('my_graph_name', $$
CREATE (:Person {name: 'John'})-[:Knows]->(:Person {name: 'Jonh'})
$$) as (a agtype);
What is the necessary configuration info needed? Any graph with atleast 2 vertices and 1 edge.
What is the command that caused the error?
In regards to orderability between different agtypes: - https://age.apache.org/age-manual/master/intro/comparability.html
I was testing the orderability between path and an edge, and I found this odd behavior: -
Comparing a path with an edge with p > e inequality: -
test=# SELECT *
FROM cypher('test', $$
WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
{id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
{id: 1, label: "label_name_2", properties: {}}::vertex
]::path as p
MATCH (n)-[e]-(m) RETURN p>e
$$) AS (e agtype);
e
------
true
true
true
true
(4 rows)
Fair enough, same result holds when we compare it with p < e inequality
test=# SELECT *
FROM cypher('test', $$
WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
{id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
{id: 1, label: "label_name_2", properties: {}}::vertex
]::path as p
MATCH (n)-[e]-(m) RETURN p<e
$$) AS (e agtype);
e
-------
false
false
false
false
(4 rows)
But look, when we change the inequality to e > p
test=# SELECT *
FROM cypher('test', $$
WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
{id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
{id: 1, label: "label_name_2", properties: {}}::vertex
]::path as p
MATCH (n)-[e]-(m) RETURN e>p
$$) AS (e agtype);
e
------
true
true
true
true
(4 rows)
We get true again, which is in direct contradiction to p > e inequality
And when we check for e < p, we get false again
test=# SELECT *
FROM cypher('test', $$
WITH [{id: 0, label: "label_name_1", properties: {i: 0}}::vertex,
{id: 2, start_id: 0, end_id: 1, label: "edge_label", properties: {i: 0}}::edge,
{id: 1, label: "label_name_2", properties: {}}::vertex
]::path as p
MATCH (n)-[e]-(m) RETURN e<p
$$) AS (e agtype);
e
-------
false
false
false
false
(4 rows)
Again, in direct contradiction to the first 2 results.
Expected behavior Would have expected p>e to be true under all scenarios.
Environment (please complete the following information):
- Version: [age 1.3.0]
The bug can be easily fixed by modifying the "static int get_type_sort_priority(enum agtype_value_type type)" function in agtype_util.c file. The reason for the bug is that there is no priority assigned for neither edge, nor path. The function returns -1 for both.
And because the value is checked like this
res = (get_type_sort_priority(va.type) <
get_type_sort_priority(vb.type)) ?
-1 :
1;
It means even if the function returns the same value for both the types, we get 1 (true) result for both edge > path and path > edge.
I would suggest modifying the function to declare priorities concretely.
@CapnSpek Then it needs to be discussed as to what the priority should be for those types. I would look to Neo4j for guidance.
It appears that neo4j has no orderability between different types.
In the sense that comparing an edge to a vertex, or an edge to a path, or a path to a vertex, or anything else, returns null.
No comparability between booleans and numerics either.
Or strings and numerics
If we wish to implement the ways of Neo4j, then orderability can be outright removed and an error can be thrown when 2 different types (except in the case of 2 numerics) are compared.
This behavior can be implemented by modifying function int compare_agtype_containers_orderability(agtype_container *a, agtype_container *b) in agtype_util.c Specifically near line 363 I believe.
However, it remains a choice of design. Whether we should have orderability between different types in AGE, or not.
@CapnSpek There was a document that covered the original reasoning a ways back. Unfortunately, I'm not able to find it anymore. Basically, orderability was chosen over comparability in order to allow for the sorting, grouping, of output.
The way the function get_type_sort_priority
is used should allow us to add in the others. However, we need to be aware that this may impact current regression tests and will impact future ones. As it will change the ordering, we need to think about what would be the ideal location for each of these additions, path and edge.
@CapnSpek Feel free to propose your ideas for adjustments to that function here.
Looking at the regression tests that relate to orderability in /regress/sql/agtype.sql line number 506
The order that that was originally intended and implemented is:-
-- Object < List < String < Boolean < Integer = Float = Numeric < Null
It does not mention anything about Edge, Vertex, Path, Array, Map.
Looking at how the regression tests are written, for example: -
SELECT agtype_in('[1,3,5,7,9,11]') < '"string"';
(line number 514)
And the fact that in agtype_util.c
array has a lower priority than string, it becomes obvious that 'List' was supposed to refer to 'Arrays' when they wrote 'List < String'
That solves the part Array < String < Boolean < Integer = Float = Numeric < Null
Leaving just object part to be specified The priority of vertex is already defined to be less than array and more than any other object in the code. (line 200 agtype_util.c)
if (type == AGTV_OBJECT)
return 0;
if (type == AGTV_VERTEX)
return 1;
if (type == AGTV_ARRAY)
return 2;
That gives us Object < Vertex < Array < String < Boolean < Integer = Float = Numeric < Null
If Edge, Path, and Map are all considered to be "objects" and assigned a priority, that would fix the issue.
Presuming that orderability was chosen over comparability in order to allow for the sorting, grouping, of output, then we can dissect the order like this
Null - Numbers - Booleans - Strings - Arrays - Graph elements - Other objects.
In that case: - Null -> Then numeric -> Then boolean -> Then string -> Then arrays -> Then vertices (graph element) Continuing this trend, it should go like -> Edges (graph element) -> Paths (collection of graph elements) -> Objects (Maps for example)
So in the end I recommend extending the current priority order to this
Object - 0 Path - 1 Edge - 2 Vertex - 3 Array - 4 String - 5 Boolean - 6 Numbers -7 Null - 8
This does not change anything in the current priority order, or the way it was intended to be, or the spirit from which it seems to arise from. It only adds further micro-priorities among previously undefined objects.
AGE with the updated priorities does pass all existing regression tests.
@CapnSpek I will ask others to review this as well.
Thanks for catching this inconsistency @CapnSpek. I think that what you proposed is quite a clean solution for this issue, without just scrapping orderability.
To add to the discussion on whether or not orderability should exist or not, I'd like to point out that even in PG, comparing different types throws an error. This could be something we consider in this overarching discussion how to handle this.
Cheers~
Thank you for your thoughts, maybe we can hold a vote regarding which direction should we take this issue in? With open feedback so everyone can express their thoughts about it.
Orderability is needed for sorting and for grouping. At least, that was the reason way, way, back.
@CapnSpek I think you can go ahead, assuming no objections, with creating a PR for your change. Others will have an opportunity to voice their opinions there as well.
My 2 cents.