pyorient icon indicating copy to clipboard operation
pyorient copied to clipboard

Pyorient Create_Record Passes Null Values to OrientDB When Unicode Object is Used Instead of String

Open JSv4 opened this issue 8 years ago • 2 comments

OrientDB Version, operating system, or hardware.

  • v2.2.1

Operating System

  • [X ] Linux - Debian 8 x64

Expected behavior and actual behavior

Just for starters, I'm new to OrientDB and am not a python expert, so if this is an easy fix, my apologies and thanks in advance.

I have an OrientDB with a class called Nodes. The Node class has the following variables:

  1. Abs_Address (String) [Also the index... no duplicates allowed]
  2. Content (String)
  3. Heading (String)
  4. Type (String)
  5. Value (String)
  6. Parent (Link -->Nodes) Children (Linkset -->Nodes)

I have a pyorient method to create a new node that has created literally thousands of them with no problems... so long as Abs_Address is not a unicode object. However, I've run into a weird problem that I only noticed when the Abs_Address was a unicode object. When my Abs_Address str object contained the unicode character u"\u2013", python treated it as type (unicode). When that happens, I started noticing I was getting a duplicate entry error from OrientDB. This was odd to me, so I traced it and found that pyorient's create record method was working in my code... unless the record I created included a unicode object... whenever that happened... the record that OrientDB reported getting was full of null data. This explained the duplicate record error. Every record that had a unicode character (I assume I can't check tens of thousands of possible records) was either passed by pyorient or interpreted by OrientDB as being null. This clearly happened more than once and that is why the duplicate record error was being thrown.

Here's the code I used to add records



def add_Node(self, Ono = None):

    if Ono!=None:

        new_Node = {'@Nodes':
                {
                    "Abs_Address":Ono.absolute_address,
                    'Content':Ono.content,
                    'Heading':Ono.heading,
                    'Type':Ono.type,
                    'Value':Ono.value
                }
       }

        try:
            print 'trying to add'
            rec_position = self.pyo_client.record_create(14, new_Node)  
            return rec_position

        except Exception as inst:
            print 'Exception:'
            print inst
            print 'Absolute address arg is'
            print Ono.absolute_address
            print self.get_Node_By_Address(Ono.absolute_address)
            print 'Return rid of new object'
            return self.get_Node_By_Address(Ono.absolute_address)[0] #this is what to use where we suspect the Abs_Address already exists... originally this only caught duplicate entry errors but I expanded it to figure out what's wrong with the code. 

    else:
        return 'NONE'

#where record already exists... get it by its absolute address
def get_Node_By_Address(self,Abs_Address=''):
    return self.pyo_client.query("SELECT * FROM Nodes WHERE Abs_Address='"+str(Abs_Address.replace(u"\u2013","-"))+"'")

For more info and some of the error message output, see this stackexchange post: http://stackoverflow.com/questions/39761684/general-python-unicode-ascii-casting-issue-causing-trouble-in-pyorient

Steps to reproduce the problem

As I said, the code above threw an error that there was a duplicate key anytime Abs_Address was type unicode. I found a way to get around this, but it's not a workable fix for me. Basically, I scanned for the specific unicode character causing trouble (an ndash u"u\2013") and replaced with a minus sign. This works, but there are several problems with this. First of all, I can't be sure this is the only unicode address I'll get, so there may still be errors. Worse, I know other variables will likely have unicode string data as there will be unusual symbols and such. Looking at the errors I was getting, it's clear that, whenever OrientDB got something that was a unicode object, it saw it as a null object. My short-term fix is to check for ndash, replace with minus, and then cast to type str(). This won't work long-term:

new_Node = {"@Nodes":
                        {
                            "Abs_Address":str(Ono.absolute_address.replace(u"\u2013","-")), # replace utf-8 symbol (ndash) to ascii (-)... WHY CAN'T I USE UTF-8
                            "Content":Ono.content.replace(u"\u2013","-"),
                            "Heading":Ono.heading.replace(u"\u2013","-"),
                            "Type":Ono.type.replace(u"\u2013","-"),
                            "Value":Ono.value.replace(u"\u2013","-")
                        }
        }

JSv4 avatar Sep 30 '16 13:09 JSv4

Don't attempt the "replace" method. it's a slippery slope. Rather always encode your values before sending them to OrientDB.

I use the following for encoding values:

import json
def _escape(string):
    return json.dumps(string)[1:-1]

In your case, you could use it like this:

    new_Node = {'@Nodes':
            {
                "Abs_Address":_escape(Ono.absolute_address),
                'Content':_escape(Ono.content),
                'Heading':_escape(Ono.heading),
                'Type':_escape(Ono.type),
                'Value':_escape(Ono.value)
            }
   }

This will ensure that your values will be properly escaped.

To query for encoded values, you will have to encode your encoded values.

result = client.query('SELECT * FROM V where Abs_Address="%s"' % _escape(_escape('/u/c/2/a1–2')))

To decode that returned value, you can do something like this:

assert  result[0].Abs_Address.encode('UTF-8').decode('unicode_escape') == '/u/c/2/a1–2'

anber500 avatar Oct 12 '16 05:10 anber500

You are my hero. I STRONGLY suggest that we include your method in the pyorient how-to / documentation. It makes tons of sense, but I wouldn't have thought of it on my own. I've seen other people having similar issues based on a few stackexchange posts... and I don't think anyone managed to help them.

EDIT: to be clearer, I would suggest that the how-tos / docs advice people to encode / decode data when storing and fetching from the DB. Maybe I just missed it, but I didn't realize I should do that.

JSv4 avatar Oct 12 '16 14:10 JSv4