jamdb_oracle icon indicating copy to clipboard operation
jamdb_oracle copied to clipboard

Connection pool is exhausted, disconnects hang forever.

Open isaacsanders opened this issue 2 years ago • 17 comments

We are using 0.5.0 of the library, via Ecto.

We have an application that makes a query at the beginning of an HTTP request.

For whatever reason, making a number of requests in quick succession (not how our users are doing this, but it was how I could repro an error state) can cause connections in the connection pool to get stuck in the disconnection process.

I think this is because of https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle.erl#L25

I would like to lobby that we change the call timeout to be either configurable or not infinity. If the process for disconnection takes too long, I would prefer to "let it crash". My application will reconnect, and the Oracle DBAs downstream can handle a couple of ungraceful shutdowns, I'd argue.

I can make this PR, but if I do, it would likely be that I'd set the timeout to be 5000 milliseconds.

isaacsanders avatar Nov 15 '22 19:11 isaacsanders

Hi,

You can specify timeout in config:

config :friends, Friends.Repo, database: "friends", username: "user", password: "pass", hostname: "localhost", timeout: 500

Value will be passed on to connect fun: https://github.com/erlangbureau/jamdb_oracle/blob/master/lib/jamdb_oracle.ex#L70 https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle_conn.erl#L33

Try to find problem line by adding erlang:display to disconnect fun in jamdb_oracle_conn.erl

-spec disconnect(state()) -> {ok, [env()]}.
disconnect(State) ->
   erlang:display(100),
    disconnect(State, 1).

-spec disconnect(state(), timeout()) -> {ok, [env()]}.
disconnect(#oraclient{socket=Socket, env=Env, passwd=Passwd}, 0) ->
   erlang:display(101),
    sock_close(Socket),
   erlang:display(102),
    exit(Passwd, ok),
   erlang:display(103),
    {ok, Env};
disconnect(#oraclient{conn_state=connected, socket=Socket, env=Env, passwd=Passwd} = State, _Tout) ->
   erlang:display(104),
    _ = send_req(close, State),
   erlang:display(105),
    sock_close(Socket),
   erlang:display(106),
    exit(Passwd, ok),
   erlang:display(107),
    {ok, Env};
disconnect(#oraclient{env=Env}, _Tout) ->
   erlang:display(108),
    {ok, Env}.

vstavskyi avatar Nov 16 '22 20:11 vstavskyi

Infinite timeouts are a bug in Erlang. You should not use them in your library when you have user provided timeouts already passed in.

isaacsanders avatar Nov 16 '22 21:11 isaacsanders

In our case Timeout is mostly used in gen_tcp and has less effect in gen_server https://www.erlang.org/doc/man/gen_tcp.html#recv-3 https://www.erlang.org/doc/man/gen_server.html#call-2

In gen_server:call default timeout is 5000 Here timeout will be 5000

- call_infinity(Pid, stop).
+ gen_server:call(Pid, stop).

Btw, you can change timeout anytime calling internal command TIMEOUT https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle_conn.erl#L124

Ecto.Adapters.SQL.query(YourApp.Repo, "timeout", [5000])

vstavskyi avatar Nov 17 '22 08:11 vstavskyi

I’m trying to tell you that your library isn’t using it right. I have connection pool processes that are hanging in the disconnection function because of your timeout on infinity

isaacsanders avatar Nov 17 '22 12:11 isaacsanders

I know this is the case because the only change I made to fix this is the PR I’ve submitted

isaacsanders avatar Nov 17 '22 12:11 isaacsanders

Is this enough?

- call_infinity(Pid, stop).
+ gen_server:call(Pid, stop).

vstavskyi avatar Nov 17 '22 12:11 vstavskyi

commit

vstavskyi avatar Nov 17 '22 13:11 vstavskyi

You need to be able to pass in the user provided timeout, as well.

isaacsanders avatar Nov 18 '22 03:11 isaacsanders

Is timeout in config not working properly?

config :friends, Friends.Repo, database: "friends", username: "user", password: "pass", hostname: "localhost", timeout: 500

vstavskyi avatar Nov 18 '22 08:11 vstavskyi

It should be passed into the sql_query function as well. And then the gen_server call.

On Nov 18, 2022, at 2:49 AM, Mykhailo Vstavskyi @.***> wrote:

 Is timeout in config not working properly?

config :friends, Friends.Repo, database: "friends", username: "user", password: "pass", hostname: "localhost", timeout: 500

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

isaacsanders avatar Nov 18 '22 14:11 isaacsanders

Please try stage branch commit

vstavskyi avatar Nov 19 '22 18:11 vstavskyi

Why make all of these commits when I have a PR already open for this?

isaacsanders avatar Nov 19 '22 19:11 isaacsanders

I'd like to close the issue with disconnect. Can you give me the working example? Disconnect on L82 doesn't work for me. https://github.com/erlangbureau/jamdb_oracle/blob/master/lib/jamdb_oracle.ex#L82

MyRepo.disconnect_all(1000) DBConnection.disconnect_all(pid, 1000) L82 is never reached.

Jamdb.Oracle.disconnect(nil, pid) ** (FunctionClauseError) no function clause matching in DBConnection.ConnectionPool.handle_call/3

Modified function is not working

  def disconnect(_err, s) do
    query(s, 'CLOSE')
    :ok 
  end

This is working Ecto.Adapters.SQL.query(YourApp.Repo, "close", [])

vstavskyi avatar Nov 20 '22 17:11 vstavskyi

My PR is to close the issue

isaacsanders avatar Nov 20 '22 18:11 isaacsanders

I used my actual working code to verify the change

isaacsanders avatar Nov 20 '22 18:11 isaacsanders

@vstavskyi Your example about the disconnect/2 function doesn't make sense to me. That code doesn't exist in master or in my PR.

isaacsanders avatar Dec 27 '22 16:12 isaacsanders

Your changes to propagate the connect timeout still doesn't respect the individual user's timeout parameter. I have updated my PR so that it can merge with master.

isaacsanders avatar Dec 27 '22 16:12 isaacsanders