gqlalchemy icon indicating copy to clipboard operation
gqlalchemy copied to clipboard

[BUG] Non-Printable Characters Lead to 'Invalid query'

Open alexanderseik opened this issue 3 years ago • 7 comments

Memgraph version 2.3.1 GQLAlchemy 1.3.1

Describe the bug Non-printable characters like '\x13' lead to 'Invalid query'

To Reproduce

from gqlalchemy import Memgraph, Node, Field

db = Memgraph()

class Test(Node):
    name: str = Field()

Test(name='\x13').save(db)

Expected behavior The str is properly sanitized/escaped or the query 'just works'.

Logs

CREATE (node:Test)   SET node.name = '\x13'  RETURN node;
---
gqlalchemy\connection.py", line 93, in execute_and_fetch
    cursor.execute(query)
mgclient.DatabaseError: Invalid query.

Additional context I have raw data that I want to load into my graph. Some of my statements fail due to the error above. The error message is not very helpful so I had to debug a bit. I think it could be in the scope of this project to do escaping/sanitizing of non-printable characters, or maybe it is a bug in memgraph itself (Python does not throw any error if I print \x13 :-) ).

At least the error message needs some improvements and more context. Maybe you could print/log the query in case of an error? I also think it would be nice to feature toggle query logging in general.

BR, Alexander

alexanderseik avatar Jul 27 '22 08:07 alexanderseik

HI @alexanderseik, this is not a GQLAlchemy issue, but it is related to parsing in Memgraph. The \\ escape sequence tells the parser to put a single backslash in the string, so you can try:

from gqlalchemy import Memgraph, Node, Field

db = Memgraph()

class Test(Node):
    name: str = Field()

Test(name='\\x13').save(db)

Let me know if that works for you 😄

katarinasupe avatar Jul 27 '22 09:07 katarinasupe

Hi @katarinasupe,

this would definitely pass the query parser, but it would also change the content.

Also something like .replace() is not working, because \x13 is not \ x 1 3. I would have to remove the entire character. The issue is in the parser. I do not know if it is by design or not, so I placed a bug ticket. I thought it might be an issue of some missing/false sanitization.

Closing this issue and cleaning up my data ;-).

BR, Alexander

alexanderseik avatar Jul 27 '22 11:07 alexanderseik

Hi @alexanderseik,

I assume you want to store some plain binary data in Memgraph. Unfortunately we only support unicode strings. The reason behind that is we are doing our best to implement the openCypher specification. In the language reference it states that the STRING type is for unicode strings. Maybe you can open an issue about this in the main Memgraph repository, but I cannot promise anything more about it.

Benjamin

antaljanosbenjamin avatar Jul 27 '22 12:07 antaljanosbenjamin

HI @antaljanosbenjamin,

it is actually user input that in 2 instances contains non printable chars. \x13 is the same as \u0013 (same on codepage). I am using Python 3.9

from gqlalchemy import Memgraph, Node, Field

db = Memgraph()

class Test(Node):
    name: str = Field()

Test(name=u'this is an example\u0013some text here').save(db)
 # u'...' is not needed as far as I know python 3 is default utf-8, but I am no python expert

This yields

CREATE (node:Test)   SET node.name = 'this is an example\x13some text here'  RETURN node; <-- becomes \x representation again
Invalid query.

The same issue is for e.g. \u0010, \u0011, etc.

For example

Test(name=u'\u0041').save(db)

In this case it works:

CREATE (node:Test)   SET node.name = 'A'  RETURN node;

As far as I understand it is an UTF-8 String that I try to enter in both cases.

I can manually remove these characters and I am ok with that. I just wanted to report this issue :-). Maybe you could open an issue at the main Repo if you think it is worth looking into :-). Or maybe the issue is somewhere in-between, e.g. mgclient?

BR Alexander

alexanderseik avatar Jul 27 '22 14:07 alexanderseik

First of all, @alexanderseik you are completely right. \x13 is also a unicode character, even though it is not printable. I don't know how I concluded my previous conclusion, but thanks for correcting me. I did have another look on this. Seems like gqlalchemy is doing something tricky here, because the same query works with pymgclient:

>>> import mgclient
>>> 
>>> connection = mgclient.connect(host="localhost", port=7687)
>>> cursor = connection.cursor()
>>> cursor.execute("CREATE (node:Test)   SET node.name = 'this is an example\x13some text here'  RETURN node;")
>>> print(cursor.fetchall())
[(<mgclient.Node(id=7, labels={'Test'}, properties={'name': 'this is an example\x13some text here'}) at 0x7f09dd0cb210>,)]

The related logs from Memgraph:

[2022-07-29 13:39:27.431] [memgraph_log] [debug] [Run] 'CREATE (node:Test)   SET node.name = 'this is an examplesome text here'  RETURN node;', query length is 86

Note that in python the non-printable character is printed using the \x00 format, so you can see that, however in Memgraph the non-printable character is not visible. I think this is acceptable. However Memgraph stores the non-printable value without any issue, it is there is the returned node.

Executing the same query with gqlalchemy we got the error you mentioned:

from gqlalchemy import Memgraph, Node, Field

db = Memgraph()

class Test(Node):
    name: str = Field()

Test(name="this is an example\x13some text here").save(db)

Memgraph logs:

[2022-07-29 13:44:22.897] [memgraph_log] [debug] [Run] 'CREATE (node:Test)   SET node.name = 'this is an example\x13some text here'  RETURN node;', query length is 89
[2022-07-29 13:44:22.897] [memgraph_log] [trace] Error message: Invalid query.

Note that the query sent by gqlalchemy is 89 character long, while pymgclient only send a query with 86 characters. So I think gqlalchemy somehow uses the same mechanism as printing a string, thus the character with binary value 0x13 is converted into an 4 character long string representation (\x13) which sent to Memgraph. Memgraph cannot understand the \x escape sequence, so it says the query is invalid.

As a result I reopen this issue. @katarinasupe please have a look at this.

antaljanosbenjamin avatar Jul 29 '22 11:07 antaljanosbenjamin

@antaljanosbenjamin thank you for your response and further looking into this issue!

I want to add that if I run:

CREATE (node:Test)   SET node.name = 'this is an example\x13some text here'  RETURN node;

in Memgraph Lab I also get an 'Invalid query' response. Here I think it is because of the unescaped '\', but I am not completely sure.

In general I would suggest (to memgraph/memgraph) to improve the invalid query response and provide more detail.

BR, Alexander

alexanderseik avatar Jul 29 '22 12:07 alexanderseik

I talked with the guys for Memgraph Lab. Actually neither Memgrgaph Lab nor mgconsole meant to handle "raw bytes". However you can use the \u0013 form for both of them. For mgconsole the printing will be a bit strange, but it works:

memgraph> CREATE (node:Test)   SET node.name = 'this is an example\u0013some text here'  RETURN node;
+-----------------------------------------------------+
| node                                                |
+-----------------------------------------------------+
| (:Test {name: "this is an examplesome text here"}) |
+-----------------------------------------------------+
1 row in set (0.001 sec)
1 labels have been created.
1 nodes have been created.
1 properties have been updated.

For Memgraph Lab the visualization might be also a bit strange, but the test it written and read correctly, check the second screenshot. image

image

antaljanosbenjamin avatar Jul 29 '22 14:07 antaljanosbenjamin