pyorient icon indicating copy to clipboard operation
pyorient copied to clipboard

infinite loop when connecting to orientdb that's running in distributed mode

Open anber500 opened this issue 8 years ago • 15 comments

I'm using orientDB v.2.2.7 on AWS in distributed mode.

When connecting using pyorient 1.5.4, the connection freeze up. Inside the OrientSerializationCSV class, on the "_parse_collection" method, the while loop goes into an infinite loop.

It looks like "_parse_set(self, content)" can't really handle unexpected formatting. It seems very unreliable. If "_parse_value" returns the same content as what was passed into it, as per the last "else" statement, the entire "_parse_set" method ends up in an infinite loop.

From my tests, it looks like "_parse_set" is only called when connecting to a distributed database. In my case, the code tries to parse the following string:

<"MyDatabase">,
usedMemory: 143389672l,
freeMemory: 375589912l,
maxMemory: 518979584l,
latencies: (node-0AbxA4VVo9PcEmOC: (entries: 200l,
last: 788930l,
min: 0l,
max: 788930l,
average: 788930.0f,
total: 788930l,
firstExecution: 1473174585720l,
lastExecution: 1473176590139l,
lastReset: 1473176590139l,
lastResetEntries: 1l)),
messages: (heartbeat: 199l,
deploy_db: 1l)),
(id: 0,
uuid: "6c9ab948-547d-4e91-b2a2-0e8381d91f0e",
name: "node-0AbxA4VVo9PcEmOC",
startedOn: 1473174542365t,
status: "ONLINE",
connections: 2,
listeners: [
  {
    "protocol": "ONetworkProtocolBinary",
    "listen": "127.11.11.11:2424"
  },
  {
    "protocol": "ONetworkProtocolHttpDb",
    "listen": "127.11.11.11:2480"
  }
],
user_replicator: "4047485640755908123",
databases: <"MyDatabase">,
usedMemory: 145392192l,
freeMemory: 373587392l,
maxMemory: 518979584l,
latencies: (node7096-5453: (entries: 200l,
last: 815985l,
min: 0l,
max: 1488559l,
average: 983307.0f,
total: 3933231l,
firstExecution: 1473174604530l,
lastExecution: 1473176594562l,
lastReset: 1473176564562l,
lastResetEntries: 4l)),
messages: (heartbeat: 200l))
],
database: (autoDeploy: true,
hotAlignment: false,
executionMode: "asynchronous",
readQuorum: 1,
writeQuorum: 2,
failureAvailableNodesLessQuorum: false,
readYourWrites: true,
servers: (*: "master"),
clusters: (internal: (),
index: (),
*: (servers: [
"node-0AbxA4VVo9PcEmOC",
"node7096-5453",
"<NEW_NODE>"
]),
posts_base_35: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_24: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_34: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_23: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_6: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_33: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_7: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
posts_base_10: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
---------------------------------
Keeps going for each cluster
---------------------------------
post_9: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
]),
post_26: (servers: [
"node7096-5453",
"node-0AbxA4VVo9PcEmOC",
"<NEW_NODE>"
])),
version: 410)

"_parse_set" is not able to parse that set of values and results in the connection locking up. Could we not use another deserializer instead?

anber500 avatar Sep 06 '16 16:09 anber500

"_parse_collection" has a similar issue. Any while loop that makes use of "_parse_value" run the risk of infinite looping if the format is unexpected. "_parse_value" is not a friend to us :(

anber500 avatar Sep 06 '16 16:09 anber500

Rather than hacking away at the current parser, I want to use another deserializer instead but that would introduce a dependency into pyorient. I'm ok with that for my project but that would probably not be ideal for the pyorient project in general.

I'm proposing that instead of specifying a "serialization_type" on "connect", why not pass the serialisation class instead?

That way we can use our own implementation. The current OrientSerializationBinary is bogus anyways.

Also, can't we pass a serialization on instantiation of pyorient? It's silly to pass it on "connect" when it might as well be passed to "db_open". In some cases, "connect" is not needed to be called.

it could look like this:

client = pyorient.OrientDB('localhost', 2424, serializer=MyCustomSerializer)

anber500 avatar Sep 07 '16 18:09 anber500

I tried using the development branch with the new Binary serialization. It works perfectly with the servers running in distributed mode.

anber500 avatar Sep 13 '16 03:09 anber500

There is a bad memory leak in the development branch. Use with caution.

anber500 avatar Sep 15 '16 00:09 anber500

The memory leak is caused by "pyorient_native". It's a shame because the binary serialize implementation is much faster than the CSV one and it's the only option that works in distributed mode :(

Here is the code that's causing all the problems:

    clsname, data = pyorient_native.deserialize(content,
                                content.__sizeof__(), self.props)

anber500 avatar Sep 15 '16 10:09 anber500

The Memory leak with pyorient_native has been resolved. Merging the binary serializer into the master branch is now a viable option for solving the distributed mode issue.

anber500 avatar Sep 21 '16 10:09 anber500

Here is a solution to the original bug. There is currently a development branch on pyorient that implements the missing binary serializer (OrientSerialization.Binary). I have another branch that has some fixes merged into it.

Install it with:

pip install https://github.com/anber500/pyorient/tarball/17f5e42e83859a661c6483f7fa812226194694dd#egg=pyorient

Set your serialiser as follows:

client = pyorient.OrientDB("localhost", 2424, serialization_type=pyorient.OrientSerialization.Binary)

You will also need an updated version of pyorient_native. The first release had a memory leak so use the version from the master branch:

pip install https://github.com/nikulukani/pyorient_native/tarball/master#egg=pyorient_native

This works perfectly on AWS in distributed mode and is much faster than the CSV serializer.

anber500 avatar Oct 02 '16 19:10 anber500

@anber500 could you make a regular push request against develop branch with your fixes? so everyone could benefit of this improvements instead installing it in a weird way? Thanks a lot!!

mogui avatar Oct 04 '16 21:10 mogui

I would but I didn't do any of the fixes. All the fixes has been stuck in push request for a while and we couldn't wait weeks for a fixed release. I only forked the code to merged a patch into the develop branch that was given to me by @nikulukani. I saw the code was missing form the develop branch and I patched it.

To be honest, I would prefer to pull from an official build but I missed 2 sprints waiting for this fix and I'm sure other developers are just as desperate for a fix.

anber500 avatar Oct 05 '16 00:10 anber500

I have also had some chats with the people from OrientDB. They have their own forked version of pyorient which they point to in their documentation. It's looks like they "officially" support that fork but they don't. It would be really great if the fixes could make it's way back upstream. As a newcomer to OrientDB, I must admit that it's very confusing to have to deal with all these forks and no one knows which ones work and which ones doesn't.

anber500 avatar Oct 05 '16 00:10 anber500

@anber500 ok now it's merged in the develop branch. Yes I know it's confusing we have to clean up all the messy with forks and the likes, we are chatting with people from OrientDB too to understand how to do it It will be better I promis :D

mogui avatar Oct 05 '16 08:10 mogui

We have experienced exactly this same problem here... Installing pyorient_native as you have instructed worked fine (thanks!) but only in the linux environment (in our development environment, windows systems, pip wasn't able to install it cause of a broken dependency, stdc++.lib). Hope to have this patch merged into the main pyorient branch soon! Regards Joao

marigonda avatar Mar 20 '17 13:03 marigonda

@marigonda I think the branches have been merged. We honestly haven't run into this issue because we deploy via dockers. It looks like stdc++.lib is a mingw dependency. Have you tried manually adding the lib to your environment?

anber500 avatar Mar 20 '17 23:03 anber500

Thanks for your reply. I will stay in debt with you about the answer regarding the stdc++.lib on windows ... btw, our option would be for the linux environment anyway.... About the merge, pip3.5 on CentOS7 (Python 3.5 from IUS Repository) did not install the native version with binary deserialization support, I had to do it manually, as you have instructed earlier (thanks again). Now things are starting to work fine but it seems we do not have OGM support with binary deserialization, do we? Thanks very much for your time!

marigonda avatar Mar 22 '17 19:03 marigonda

You can use the OGM with binary deserialization if you are using the develop branch of pyorient and have pyorient_native installed from pyorient.ogm import Graph, Config from pyorient.serializations import OrientSerialization graph = Graph(Config.from_url('my_db','root', 'root', initial_drop=False, serialization_type=OrientSerialization.Binary))

nikulukani avatar Mar 23 '17 20:03 nikulukani