marco_polo icon indicating copy to clipboard operation
marco_polo copied to clipboard

Timeout errors when making lots of calls

Open pdilyard opened this issue 8 years ago • 9 comments

When many requests are made (in my case, synchronously), I eventually get the following error message:

** (exit) exited in: :gen_server.call(#PID<0.644.0>, {:operation, :command, [{:raw, "s"}, 194, {:raw, <<0, 0, 0, 1, 99>>}, {:raw, [<<0, 0, 0, 159, 67, 82, 69, 65, 84, 69, 32, 69, 68, 71, 69, 32, 104, 97, 115, 95, 100, 101, 109, 111, 103, 114, 97, 112, 104, 105, 99, 95, 114, 101, 115, 112, 111, 110, 115, 101, 10, 70, 82, ...>>, <<1>>, [<<0, 0, 0, 20>>, [0, [[<<0>>, ""], [[[[<<20>>, "parameters"], <<0, 0, 0, 19>>, 12]], 0, [[<<0>>, [], []]]]]]], <<0>>]}]}, 5000)
    ** (EXIT) time out
        (stdlib) gen_server.erl:212: :gen_server.call/3
    (marco_polo) lib/marco_polo.ex:831: MarcoPolo.refetching_schema/2
     (ex_orient) lib/ex_orient/db.ex:29: anonymous fn/3 in ExOrient.DB.command/2
       (poolboy) src/poolboy.erl:76: :poolboy.transaction/3
      (hydrogen) lib/migrator/agents_agentattribute.ex:12: anonymous fn/1 in Migrator.AgentsAgentAttribute.run/1
        (elixir) lib/enum.ex:610: anonymous fn/3 in Enum.each/2
        (elixir) lib/enum.ex:1478: anonymous fn/3 in Enum.reduce/3
        (elixir) lib/stream.ex:1227: anonymous fn/3 in Enumerable.Stream.reduce/3
        (elixir) lib/enum.ex:2744: Enumerable.List.reduce/3
        (elixir) lib/stream.ex:732: Stream.do_list_transform/9
        (elixir) lib/stream.ex:1247: Enumerable.Stream.do_each/4
        (elixir) lib/enum.ex:1477: Enum.reduce/3
        (elixir) lib/enum.ex:609: Enum.each/2
      (hydrogen) lib/migrator/all.ex:69: Migrator.All.migrate/1

pdilyard avatar Aug 10 '16 02:08 pdilyard

Hey there, do you have a minimal way of reproducing this?

whatyouhide avatar Aug 10 '16 09:08 whatyouhide

I've been able to replicate it most easily by inserting a lot of records, say 50,000, and then running a query to select them all. Even if you set your timeout to 60_000, the refetching_schema call will timeout in 5 seconds.

Not sure if this matters or not.

Also keep in mind that this isn't something I would typically do, it's actually for a huge, one-time batch operation.

pdilyard avatar Aug 11 '16 15:08 pdilyard

I see, very interesting! Could you setup a small repo that I can use to reproduce this easily? I understand the process (it looks really straightforward) but I won't have time to look into it until the weekend, and a way to reproduce this right away would definitely be of help :)

whatyouhide avatar Aug 11 '16 20:08 whatyouhide

I can't seem to reproduce this consistently. It only seems to happen after I insert a lot of data into Orient. I've tried restarting the app, restarting Orient, etc. I can make queries like normal in the OrientDB interface, but I cannot make them through MarcoPolo, as the refetching_schema function repeatedly times out.

I'm trying to migrate data to Orient, and I can't seem to get a reliable result.

pdilyard avatar Aug 12 '16 21:08 pdilyard

So every time you make 50k INSERT queries and then a query to select all the inserted records you get this timeout?

whatyouhide avatar Aug 13 '16 16:08 whatyouhide

I've found that it's actually more of a random number of inserts, but basically, yes. I'm looking through the marco polo source and don't really see why it even needs to fetch the schema of the database at all - is there a way I could just disable this temporarily?

pdilyard avatar Aug 14 '16 00:08 pdilyard

@pdilyard that's how OrientDB works, the binary protocol isn't self-contained and for schemaful records it returns the type of their properties as "type ids" to be looked up in the database schema. 😕 It's not a great design to work with IMO, but it's what we have right now ¯_(ツ)_/¯

whatyouhide avatar Aug 14 '16 10:08 whatyouhide

Hmm. Well this is a huge blocker for me right now, I can't successfully get a migration script to work all the way through without this function timing out. Is there anything at all I can do about it?

pdilyard avatar Aug 14 '16 20:08 pdilyard

Sorry to hear, but I will not be able to work on this very soon. I may be able to give it a shot this week but cannot guarantee. What you can do is try and debug this, maybe find the bug and even better, a fix, and submit a PR :) That's open source at its finest! :P

On Sunday, 14 August 2016, Paul Dilyard [email protected] wrote:

Hmm. Well this is a huge blocker for me right now, I can't successfully get a migration script to work all the way through without this function timing out. Is there anything at all I can do about it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MyMedsAndMe/marco_polo/issues/42#issuecomment-239695005, or mute the thread https://github.com/notifications/unsubscribe-auth/ADtcSlnJj-7GHFecaZN3O17zWl3vWTF2ks5qf3rfgaJpZM4Jgta_ .

Andrea Leopardi [email protected]

whatyouhide avatar Aug 14 '16 23:08 whatyouhide